用 BeautifulSoup 抓取亚马逊网站
Webscraping Amazon with BeautifulSoup
我正在尝试网络抓取亚马逊评论:https://www.amazon.com/Python-Crash-Course-Hands-Project-Based/dp/1593276036/ref=sr_1_3?ie=UTF8&qid=1541450645&sr=8-3&keywords=python
这是我的代码:
import requests as req
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Kevin\'s_request'}
r = req.get('https://www.amazon.com/Python-Crash-Course-Hands-Project-Based/dp/1593276036/ref=sr_1_3?ie=UTF8&qid=1541450645&sr=8-3&keywords=python', headers=headers)
soup = BeautifulSoup(r.text, "html.parser")
soup.find(class_="a-expander-content a-expander-partial-collapse-content")
我只得到一个空列表。我在 Jupyter 笔记本和 BS 4
中使用 Python 3.6.4
不确定你这边发生了什么,但这段代码工作正常。
开始了(python 3.6,BSP 4.6.3):
import requests
from bs4 import BeautifulSoup
def s_comments(url):
headers = {'User-Agent': 'Bob\'s_request'}
response = requests.get(url, headers=headers )
if response.status_code != 200:
raise ConnectionError
soup = BeautifulSoup(response.content)
return soup.find_all(class_="a-expander-content a-expander-partial- collapse-content")
url = 'https://www.amazon.com/dp/1593276036'
reviews = s_comments(url)
for i, review in enumerate(reviews):
print('---- {} ----'.format(i))
print(review.text)
试试这个方法。结果是您的选择器找不到任何东西。但是,我已经修复它以达到目的:
import requests
from bs4 import BeautifulSoup
def get_reviews(s,url):
s.headers['User-Agent'] = 'Mozilla/5.0'
response = s.get(url)
soup = BeautifulSoup(response.text,"lxml")
return soup.find_all("div",{"data-hook":"review-collapsed"})
if __name__ == '__main__':
link = 'https://www.amazon.com/Python-Crash-Course-Hands-Project-Based/dp/1593276036/ref=sr_1_3?ie=UTF8&qid=1541450645&sr=8-3&keywords=python'
with requests.Session() as s:
for review in get_reviews(s,link):
print(f'{review.text}\n')
我正在尝试网络抓取亚马逊评论:https://www.amazon.com/Python-Crash-Course-Hands-Project-Based/dp/1593276036/ref=sr_1_3?ie=UTF8&qid=1541450645&sr=8-3&keywords=python
这是我的代码:
import requests as req
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Kevin\'s_request'}
r = req.get('https://www.amazon.com/Python-Crash-Course-Hands-Project-Based/dp/1593276036/ref=sr_1_3?ie=UTF8&qid=1541450645&sr=8-3&keywords=python', headers=headers)
soup = BeautifulSoup(r.text, "html.parser")
soup.find(class_="a-expander-content a-expander-partial-collapse-content")
我只得到一个空列表。我在 Jupyter 笔记本和 BS 4
中使用 Python 3.6.4不确定你这边发生了什么,但这段代码工作正常。 开始了(python 3.6,BSP 4.6.3):
import requests
from bs4 import BeautifulSoup
def s_comments(url):
headers = {'User-Agent': 'Bob\'s_request'}
response = requests.get(url, headers=headers )
if response.status_code != 200:
raise ConnectionError
soup = BeautifulSoup(response.content)
return soup.find_all(class_="a-expander-content a-expander-partial- collapse-content")
url = 'https://www.amazon.com/dp/1593276036'
reviews = s_comments(url)
for i, review in enumerate(reviews):
print('---- {} ----'.format(i))
print(review.text)
试试这个方法。结果是您的选择器找不到任何东西。但是,我已经修复它以达到目的:
import requests
from bs4 import BeautifulSoup
def get_reviews(s,url):
s.headers['User-Agent'] = 'Mozilla/5.0'
response = s.get(url)
soup = BeautifulSoup(response.text,"lxml")
return soup.find_all("div",{"data-hook":"review-collapsed"})
if __name__ == '__main__':
link = 'https://www.amazon.com/Python-Crash-Course-Hands-Project-Based/dp/1593276036/ref=sr_1_3?ie=UTF8&qid=1541450645&sr=8-3&keywords=python'
with requests.Session() as s:
for review in get_reviews(s,link):
print(f'{review.text}\n')