如何在 python3 中正确使用 BeatifulSoup4 中的查找功能?

How do I properly use the find function from BeatifulSoup4 in python3?

我正在关注有关如何抓取亚马逊产品页面的 YouTube 教程。首先,我试图获得产品名称。后来我想得到亚马逊价格和二手价格。为此,我是 ustin requests 和 bs4。到目前为止的代码:

import requests
from bs4 import BeautifulSoup

URL = 'https://www.amazon.de/Teenage-Engineering-Synthesizer-FM-Radio-AMOLED-Display/dp/B00CXSJUZS/ref=sr_1_1_sspa?__mk_de_DE=%C3%85M%C3%85%C5%BD%C3%95%C3%91&dchild=1&keywords=op-1&qid=1594672884&sr=8-1-spons&psc=1&smid=A1GQGGPCGF8PV9&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUFEMUZSUjhQMUM3NTkmZW5jcnlwdGVkSWQ9QTAwMzMwODkyQkpTNUJUUE9QUFVFJmVuY3J5cHRlZEFkSWQ9QTA4MzM4NDgxV1Y3UzVVN1lXTUZKJndpZGdldE5hbWU9c3BfYXRmJmFjdGlvbj1jbGlja1JlZGlyZWN0JmRvTm90TG9nQ2xpY2s9dHJ1ZQ=='

headers = {"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'}

page = requests.get(URL,headers=headers)
soup = BeautifulSoup(page.content,'html.parser')


title = soup.find('span',{'id' : "productTitle"})
print(title)

我的头衔是None。所以查找函数没有找到 ID 为“productTitle”的元素。但是检查 soup 显示,有一个具有该 id 的元素..

那么我的代码有什么问题? 我也试过:

title = soup.find(id = "productTitle")

试试这个:

import requests
from bs4 import BeautifulSoup

URL = 'https://www.amazon.de/Teenage-Engineering-Synthesizer-FM-Radio-AMOLED-Display/dp/B00CXSJUZS/ref=sr_1_1_sspa?__mk_de_DE=%C3%85M%C3%85%C5%BD%C3%95%C3%91&dchild=1&keywords=op-1&qid=1594672884&sr=8-1-spons&psc=1&smid=A1GQGGPCGF8PV9&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUFEMUZSUjhQMUM3NTkmZW5jcnlwdGVkSWQ9QTAwMzMwODkyQkpTNUJUUE9QUFVFJmVuY3J5cHRlZEFkSWQ9QTA4MzM4NDgxV1Y3UzVVN1lXTUZKJndpZGdldE5hbWU9c3BfYXRmJmFjdGlvbj1jbGlja1JlZGlyZWN0JmRvTm90TG9nQ2xpY2s9dHJ1ZQ=='

headers = {"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'}

page = requests.get(URL,headers=headers)
soup = BeautifulSoup(page.content,'lxml')


title = soup.find('span',{'id' : "productTitle"})
print(title.text.strip())

你做的是对的,但是有一个“糟糕的”解析器。阅读更多关于解析器之间差异的信息 here。我更喜欢 lxml,但有时也使用 html5lib。我还添加了

.text.strip()

打印,因此只打印标题文本。

注意:您必须先为 python 安装 lxml!