从 Highcharts 生成的图表中抓取数据
Scrape data from graph generated with Highcharts
试图从 website.
中抓取几款游戏的历史价格
Highcharts.js 用于根据历史数据生成包含两个系列的图表。示例页面是 https://gg.deals/game/snowrunner/.
我可以使用 JavaScript
访问数据:
Highcharts.charts[0].series[0].data
和
Highcharts.charts[0].series[1].data
但是,我想知道是否有另一种方法可以用来获取数据而无需解析 JavaScript
代码。
一种选择是使用 api 为图表提供数据。
注: 有不同来源data-without-keyshops-url
和data-with-keyshops-url
步骤#1 - 从产品页面生成 dataUrl
soup = bs(requests.get('https://gg.deals/game/snowrunner/').text, 'lxml')
dataUrl = 'https://gg.deals'+soup.select_one('#historical-chart-container')['data-without-keyshops-url']
步骤 #2 - 请求 json 数据
r = requests.get(dataUrl, headers={'X-Requested-With': 'XMLHttpRequest'})
r.json()
例子
import requests,json
from bs4 import BeautifulSoup as bs
url = 'https://gg.deals/game/snowrunner/'
r = requests.get(url)
soup = bs(r.text, 'lxml')
dataUrl = 'https://gg.deals'+soup.select_one('#historical-chart-container')['data-without-keyshops-url']
r = requests.get(dataUrl, headers={'X-Requested-With': 'XMLHttpRequest'})
r.json()['chartData']['deals']
输出
[{'x': 1581956320000,
'y': 39.99,
'shop': 'Epic Games Store',
'name': '17 Feb 2020 16:18 - 22 May 2020 15:38'},
{'x': 1590161908000,
'y': 29.99,
'shop': 'Epic Games Store',
'name': '22 May 2020 15:38 - 11 Jun 2020 17:11'},
{'x': 1591895501000,
'y': 39.99,
'shop': 'Epic Games Store',
'name': '11 Jun 2020 17:11 - 16 Jun 2020 13:18'},
{'x': 1592313517000,
'y': 31.99,
'shop': 'Epic Games Store',
'name': '16 Jun 2020 13:18 - 30 Jun 2020 13:20'},
{'x': 1593523255000,
'y': 39.99,
'shop': 'Epic Games Store',
'name': '30 Jun 2020 13:20 - 23 Jul 2020 16:08'},
{'x': 1595520520000,
'y': 31.99,
'shop': 'Epic Games Store',
'name': '23 Jul 2020 16:08 - 6 Aug 2020 15:03'},
{'x': 1596726196000,
'y': 39.99,
'shop': 'Epic Games Store',
'name': '6 Aug 2020 15:03 - 24 Sep 2020 16:19'},...]
试图从 website.
中抓取几款游戏的历史价格Highcharts.js 用于根据历史数据生成包含两个系列的图表。示例页面是 https://gg.deals/game/snowrunner/.
我可以使用 JavaScript
访问数据:
Highcharts.charts[0].series[0].data
和
Highcharts.charts[0].series[1].data
但是,我想知道是否有另一种方法可以用来获取数据而无需解析 JavaScript
代码。
一种选择是使用 api 为图表提供数据。
注: 有不同来源data-without-keyshops-url
和data-with-keyshops-url
步骤#1 - 从产品页面生成 dataUrl
soup = bs(requests.get('https://gg.deals/game/snowrunner/').text, 'lxml')
dataUrl = 'https://gg.deals'+soup.select_one('#historical-chart-container')['data-without-keyshops-url']
步骤 #2 - 请求 json 数据
r = requests.get(dataUrl, headers={'X-Requested-With': 'XMLHttpRequest'})
r.json()
例子
import requests,json
from bs4 import BeautifulSoup as bs
url = 'https://gg.deals/game/snowrunner/'
r = requests.get(url)
soup = bs(r.text, 'lxml')
dataUrl = 'https://gg.deals'+soup.select_one('#historical-chart-container')['data-without-keyshops-url']
r = requests.get(dataUrl, headers={'X-Requested-With': 'XMLHttpRequest'})
r.json()['chartData']['deals']
输出
[{'x': 1581956320000,
'y': 39.99,
'shop': 'Epic Games Store',
'name': '17 Feb 2020 16:18 - 22 May 2020 15:38'},
{'x': 1590161908000,
'y': 29.99,
'shop': 'Epic Games Store',
'name': '22 May 2020 15:38 - 11 Jun 2020 17:11'},
{'x': 1591895501000,
'y': 39.99,
'shop': 'Epic Games Store',
'name': '11 Jun 2020 17:11 - 16 Jun 2020 13:18'},
{'x': 1592313517000,
'y': 31.99,
'shop': 'Epic Games Store',
'name': '16 Jun 2020 13:18 - 30 Jun 2020 13:20'},
{'x': 1593523255000,
'y': 39.99,
'shop': 'Epic Games Store',
'name': '30 Jun 2020 13:20 - 23 Jul 2020 16:08'},
{'x': 1595520520000,
'y': 31.99,
'shop': 'Epic Games Store',
'name': '23 Jul 2020 16:08 - 6 Aug 2020 15:03'},
{'x': 1596726196000,
'y': 39.99,
'shop': 'Epic Games Store',
'name': '6 Aug 2020 15:03 - 24 Sep 2020 16:19'},...]