Python 网络抓取工具不适用于 TripAdvisor

Python web-scraper not working for TripAdvisor

我正在尝试编写一个简单的 Python 抓取工具,以便在 TripAdvisor 上保存特定地点的所有评论].

具体link我举的例子如下:

https://www.tripadvisor.com/Attraction_Review-g319796-d5988326-Reviews-or50-Museo_de_Altamira-Santillana_del_Mar_Cantabria.html

这是我正在使用的代码,应该打印相对 html:

from bs4 import BeautifulSoup
import requests

url = "https://www.tripadvisor.com/Attraction_Review-g319796-d5988326-Reviews-or50-Museo_de_Altamira-Santillana_del_Mar_Cantabria.html"

r = requests.get(url)
data = r.text
soup = BeautifulSoup(data)
print(soup)

如果我在控制台中 运行 这段代码,它会在 requests.get(url) 上停留很长时间而没有任何输出。使用另一个 url(例如 url = "https://whosebug.com/"),我立即得到正确显示的 html。为什么 TripAdvisor 无法使用?我怎样才能设法获得它的 html?

添加 user-agent 应该可以在第一步解决您的问题,因为某些网站提供不同的内容或将其用于机器人/自动化检测 - 在浏览器中打开 DEVTools 并从其中复制 user-agent您的请求数:

headers = {'User-Agent': 'Mozilla/5.0'}
r = requests.get(url,headers=headers)

例子

from bs4 import BeautifulSoup
import requests

url = "https://www.tripadvisor.com/Attraction_Review-g319796-d5988326-Reviews-or50-Museo_de_Altamira-Santillana_del_Mar_Cantabria.html"
headers = {'User-Agent': 'Mozilla/5.0'}

r = requests.get(url,headers=headers)
data = r.text
soup = BeautifulSoup(data)
data = []

for e in soup.select('#tab-data-qa-reviews-0 [data-automation="reviewCard"]'):
    data.append({
        'rating':e.select_one('svg[aria-label]')['aria-label'],
        'profilUrl':e.select_one('a[tabindex="0"]').get('href'),
        'content':e.select_one('div:has(>a[tabindex="0"]) + div + div').text
    })

data

输出

[{'rating': '5.0 of 5 bubbles',
  'profilUrl': '/ShowUserReviews-g319796-d5988326-r620396152-Museo_de_Altamira-Santillana_del_Mar_Cantabria.html',
  'content': "We were fortunate to get in without pre-booking.What a find. A UNESCO site in the middle of the countryside.The replication cave is so awesome and authentic, hard to believe it's not the real thing.The museum is beautifully curated, great for students, and anyone interested in archeology and the beginnings of human existence.Definitely worth visiting. We nearly missed out Read more"},
 {'rating': '5.0 of 5 bubbles',
  'profilUrl': '/ShowUserReviews-g319796-d5988326-r618358203-Museo_de_Altamira-Santillana_del_Mar_Cantabria.html',
  'content': 'Beautiful site with great replica’s of the original cave, excellent exposition, poor film as an introduction however!The most urgent issue: long waiting because you need a slot to enter. This could be done 1000% better and in every decent museum it is done better! Staff probably civil servants with no great desire to make you enjoy the visit. Building urgently needs a revamp, no exposure at all!Read more'},...]