我在尝试使用 BS4 从 Trustpilot 网络抓取日期时收到以下 JSON 错误 - Python
I am getting the following JSON error when trying to web scrape dates from Trustpilot with BS4 - Python
我正在使用以下脚本从 Trustpilot 网站抓取用户评论数据,以使用来自 https://ca.trustpilot.com/review/www.hellofresh.ca 我希望抓取
的数据对用户情绪进行一些分析
日期、星级、评论内容。
但是当我 运行 代码时,出现以下错误,任何人都可以帮助解释原因吗?
JSONDecodeError:预期值:第 1 行第 1 列(字符 0)
stars = []
dates = []
comments = []
results = []
with requests.Session() as s:
for num in range(1,2):
url = "https://ca.trustpilot.com/review/www.hellofresh.ca?page={}".format(num)
r = s.get(url, headers = headers)
soup = BeautifulSoup(r.content, 'lxml')
for star in soup.find_all("section", {"class":"review__content"}):
# Get rating value
rating = star.find("div", {"class":"star-rating star-rating--medium"}).find('img').get('alt')
# Get date value
#date_json = json.loads(star.find('script').text)
#date = date_json['publishedDate']
date_tag = star.select("div.review-content-header__dates > script")
date = json.loads(date_tag[0].text)
dt = datetime.strptime(date['publishedDate'], "%Y-%m-%dT%H:%M:%SZ")
# Get comment
comment = star.find("div", class_="review-content__body").text
stars.append(rating)
dates.append(dt)
comments.append(comment)
data = {"Rating": rating, "Review": comment, "Dates": date}
results.append(data)
time.sleep(2)
print(results)```
要获取JSON数据,可以调用.string
方法。
...
date = json.loads(date_tag[0].string)
>>> print(date)
{'publishedDate': '2021-01-04T21:57:34+00:00', 'updatedDate': None, 'reportedDate': None}
...
...
我正在使用以下脚本从 Trustpilot 网站抓取用户评论数据,以使用来自 https://ca.trustpilot.com/review/www.hellofresh.ca 我希望抓取
的数据对用户情绪进行一些分析日期、星级、评论内容。
但是当我 运行 代码时,出现以下错误,任何人都可以帮助解释原因吗?
JSONDecodeError:预期值:第 1 行第 1 列(字符 0)
stars = []
dates = []
comments = []
results = []
with requests.Session() as s:
for num in range(1,2):
url = "https://ca.trustpilot.com/review/www.hellofresh.ca?page={}".format(num)
r = s.get(url, headers = headers)
soup = BeautifulSoup(r.content, 'lxml')
for star in soup.find_all("section", {"class":"review__content"}):
# Get rating value
rating = star.find("div", {"class":"star-rating star-rating--medium"}).find('img').get('alt')
# Get date value
#date_json = json.loads(star.find('script').text)
#date = date_json['publishedDate']
date_tag = star.select("div.review-content-header__dates > script")
date = json.loads(date_tag[0].text)
dt = datetime.strptime(date['publishedDate'], "%Y-%m-%dT%H:%M:%SZ")
# Get comment
comment = star.find("div", class_="review-content__body").text
stars.append(rating)
dates.append(dt)
comments.append(comment)
data = {"Rating": rating, "Review": comment, "Dates": date}
results.append(data)
time.sleep(2)
print(results)```
要获取JSON数据,可以调用.string
方法。
...
date = json.loads(date_tag[0].string)
>>> print(date)
{'publishedDate': '2021-01-04T21:57:34+00:00', 'updatedDate': None, 'reportedDate': None}
...
...