从 Highcharts 生成的图表中抓取数据

Scrape data from graph generated with Highcharts

试图从 website.

中抓取几款游戏的历史价格

Highcharts.js 用于根据历史数据生成包含两个系列的图表。示例页面是 https://gg.deals/game/snowrunner/.

我可以使用 JavaScript 访问数据:

Highcharts.charts[0].series[0].data

Highcharts.charts[0].series[1].data 

但是,我想知道是否有另一种方法可以用来获取数据而无需解析 JavaScript 代码。

一种选择是使用 api 为图表提供数据。

注: 有不同来源data-without-keyshops-urldata-with-keyshops-url

步骤#1 - 从产品页面生成 dataUrl

soup = bs(requests.get('https://gg.deals/game/snowrunner/').text, 'lxml')

dataUrl = 'https://gg.deals'+soup.select_one('#historical-chart-container')['data-without-keyshops-url']

步骤 #2 - 请求 json 数据

r  = requests.get(dataUrl, headers={'X-Requested-With': 'XMLHttpRequest'})
r.json()

例子

import requests,json
from bs4 import BeautifulSoup as bs

url = 'https://gg.deals/game/snowrunner/'
r = requests.get(url)
soup = bs(r.text, 'lxml')

dataUrl = 'https://gg.deals'+soup.select_one('#historical-chart-container')['data-without-keyshops-url']
r  = requests.get(dataUrl, headers={'X-Requested-With': 'XMLHttpRequest'})
r.json()['chartData']['deals']

输出

[{'x': 1581956320000,
  'y': 39.99,
  'shop': 'Epic Games Store',
  'name': '17 Feb 2020 16:18 - 22 May 2020 15:38'},
 {'x': 1590161908000,
  'y': 29.99,
  'shop': 'Epic Games Store',
  'name': '22 May 2020 15:38 - 11 Jun 2020 17:11'},
 {'x': 1591895501000,
  'y': 39.99,
  'shop': 'Epic Games Store',
  'name': '11 Jun 2020 17:11 - 16 Jun 2020 13:18'},
 {'x': 1592313517000,
  'y': 31.99,
  'shop': 'Epic Games Store',
  'name': '16 Jun 2020 13:18 - 30 Jun 2020 13:20'},
 {'x': 1593523255000,
  'y': 39.99,
  'shop': 'Epic Games Store',
  'name': '30 Jun 2020 13:20 - 23 Jul 2020 16:08'},
 {'x': 1595520520000,
  'y': 31.99,
  'shop': 'Epic Games Store',
  'name': '23 Jul 2020 16:08 - 6 Aug 2020 15:03'},
 {'x': 1596726196000,
  'y': 39.99,
  'shop': 'Epic Games Store',
  'name': '6 Aug 2020 15:03 - 24 Sep 2020 16:19'},...]