Python BeautifulSoup Web Scraping 第一次有效但第二次或任何后续时间无效

Question

我是 Python 新手，并尝试编写网络爬虫脚本以获取一些价格数据。我试图抓取的网站例如： https://www.medizinfuchs.de/?params%5Bsearch%5D=10192710&params%5Bsearch_cat%5D=1

我正在使用以下代码：

from bs4 import BeautifulSoup
import requests

URL = "https://www.medizinfuchs.de/?params%5Bsearch%5D=11484834&params%5Bsearch_cat%5D=1"
page = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser")

for price in soup.select('li.apotheke div.price'):
        print(float(price.text.strip(' \t\n€').replace(',', '.')))

for name in soup.select('li.apotheke a.name'):
        print(str(name.text.strip(' \t\n€')))

我第一次运行它就像一个魅力，但之后我没有得到输出...

我期望的输出：

5.39 5.4 5.4 5.65 5.8 5.89 5.89 5.94 ApothekePrime Apoversand24.de bon-vita.de 1-apo.de eurapon.de docmorris.de sternapo ahorn24.de

你能帮我让它始终如一地工作吗？

谢谢

Answer 1

会发生什么？

看看你的汤——它说的是实话。在 soup 中没有 <li> 和 class apotheke，所以你不会得到任何结果。

如何修复？

Select 正确的标签或跳过它们并专注于 classes（不是最好的主意，因为 classes 经常改变，但在这种情况下最好你可以做)

for e in soup.select('.apotheke'):
    print(e.select_one('.price').get_text(strip=True).split(' ')[0])
    
for e in soup.select('.apotheke'):
    print(e.select_one('.name').get_text(strip=True))

示例（更结构化）

data=[]
for e in soup.select('.apotheke'):
    data.append({
        'name':e.select_one('.name').get_text(strip=True),
        'price':e.select_one('.price').get_text(strip=True).split(' ')[0]
    })
data

输出

[{'name': 'ApothekePrime', 'price': '5,39'},
 {'name': 'Apoversand24.de', 'price': '5,40'},
 {'name': 'bon-vita.de', 'price': '5,40'},
 {'name': '1-apo.de', 'price': '5,65'},
 {'name': 'eurapon.de', 'price': '5,80'},
 {'name': 'docmorris.de', 'price': '5,89'},
 {'name': 'sternapo', 'price': '5,89'},
 {'name': 'ahorn24.de', 'price': '5,94'}]

Python BeautifulSoup Web Scraping 第一次有效但第二次或任何后续时间无效

Python BeautifulSoup Web Scraping works for the first but not second or any following times

html

python

screen-scraping

beautifulsoup

request

会发生什么？

如何修复？

示例（更结构化）

输出