需要使用 BeautifulSoup 抓取网页并使用 python 查找数据丰富的节点，尤其是表格

Question

我有以下代码，但它给我一个错误。

import requests
from bs4 import BeautifulSoup
url = "http://www.amazon.com/Harry-Potter-And-Chamber-Secrets/dp/0439064872/ref=pd_bxgy_b_img_y"
r = requests.get(url)
html = BeautifulSoup(r.content)
links = html.find("table",{"class":"bucket"}).find_all("h2",{"class":"content"})

print links

我收到以下错误：

Traceback (most recent call last):
  File "C:/Users/pgadmin/Google Drive/share sem2/SEMINAR/4.py", line 52, in <module>
    links = html.find("table",{"class":"bucket"}).find_all("h2",{"class":"content"})
AttributeError: 'NoneType' object has no attribute 'find_all'

我正在尝试将数据放入存储桶 class。

Answer 1

标签不正确。您想要的数据在 td 标签内。像这样使用它：

bucket = html.find("td", attrs={"class":"bucket"})
links = [a.get('href') for a in bucket.find_all('a')]

需要使用 BeautifulSoup 抓取网页并使用 python 查找数据丰富的节点，尤其是表格

Need to scrape a webpage using BeautifulSoup and find data rich nodes especially tables using python

tags

beautifulsoup

web-scraping

python-2.7