使用 python 从网站上抓取每个产品的 href
scrape href of each product from website using python
我正在使用 beautifulsoup 抓取此网页中每个产品的 href:http://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=digital+camera。这些 href 以 "keywords=digital+camera" 结尾
这是我的代码:
from bs4 import BeautifulSoup
import requests
url = "http://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=digital+camera"
keyword = "keywords=digital+camera"
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data)
for link in soup.find_all('a'):
href = link.get('href')
if href is None:
continue
elif keyword in href:
print href
我没有从上面的脚本中得到任何结果,我能做些什么来修复它吗?
谢谢
亚马逊正在检测用户代理 ("the name of your browser") 并根据该值更改内容。如果您向请求添加用户代理,您将获得添加了 "keyword=digital+camera" 的字符串。否则,你不会。
url ="http://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=digital+camera"
import urllib2
from bs4 import BeautifulSoup
req = urllib2.Request(url, headers={ 'User-Agent': 'Mozilla/5.0' })
html = urllib2.urlopen(req).read()
soup = BeautifulSoup(html)
links = soup.find_all('a')
for l in links:
href = l.get('href')
title = l.get('title', '')
if 'Sony W800/B 20.1 MP Digital' in title:
print(href) # prints: http://www.amazon.com/Sony-W800-Digital-Camera-Black/dp/B00I8BIBCW/ref=sr_1_2/183-0842534-8993425?s=photo&ie=UTF8&qid=1421400650&sr=1-2&keywords=digital+camera
我正在使用 beautifulsoup 抓取此网页中每个产品的 href:http://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=digital+camera。这些 href 以 "keywords=digital+camera" 结尾 这是我的代码:
from bs4 import BeautifulSoup
import requests
url = "http://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=digital+camera"
keyword = "keywords=digital+camera"
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data)
for link in soup.find_all('a'):
href = link.get('href')
if href is None:
continue
elif keyword in href:
print href
我没有从上面的脚本中得到任何结果,我能做些什么来修复它吗? 谢谢
亚马逊正在检测用户代理 ("the name of your browser") 并根据该值更改内容。如果您向请求添加用户代理,您将获得添加了 "keyword=digital+camera" 的字符串。否则,你不会。
url ="http://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=digital+camera"
import urllib2
from bs4 import BeautifulSoup
req = urllib2.Request(url, headers={ 'User-Agent': 'Mozilla/5.0' })
html = urllib2.urlopen(req).read()
soup = BeautifulSoup(html)
links = soup.find_all('a')
for l in links:
href = l.get('href')
title = l.get('title', '')
if 'Sony W800/B 20.1 MP Digital' in title:
print(href) # prints: http://www.amazon.com/Sony-W800-Digital-Camera-Black/dp/B00I8BIBCW/ref=sr_1_2/183-0842534-8993425?s=photo&ie=UTF8&qid=1421400650&sr=1-2&keywords=digital+camera