需要使用 BeautifulSoup 抓取网页并使用 python 查找数据丰富的节点,尤其是表格

Need to scrape a webpage using BeautifulSoup and find data rich nodes especially tables using python

我有以下代码,但它给我一个错误。

import requests
from bs4 import BeautifulSoup
url = "http://www.amazon.com/Harry-Potter-And-Chamber-Secrets/dp/0439064872/ref=pd_bxgy_b_img_y"
r = requests.get(url)
html = BeautifulSoup(r.content)
links = html.find("table",{"class":"bucket"}).find_all("h2",{"class":"content"})

print links

我收到以下错误:

Traceback (most recent call last):
  File "C:/Users/pgadmin/Google Drive/share sem2/SEMINAR/4.py", line 52, in <module>
    links = html.find("table",{"class":"bucket"}).find_all("h2",{"class":"content"})
AttributeError: 'NoneType' object has no attribute 'find_all'

我正在尝试将数据放入存储桶 class。

标签不正确。您想要的数据在 td 标签内。像这样使用它:

bucket = html.find("td", attrs={"class":"bucket"})
links = [a.get('href') for a in bucket.find_all('a')]