Json csv 问题
Json problems to csv
我正在尝试从 NBA 统计页面获取一些统计数据。我正在关注本教程的想法
https://towardsdatascience.com/using-python-pandas-and-plotly-to-generate-nba-shot-charts-e28f873a99cb
基本思路是将数据放入csv文件。
所以我尝试使用这段代码,从 nba 网站获取数据,尝试获取 json 文件并将其转换为 csv:
import requests
import json
import pandas as pd
from pandas import DataFrame as df
import urllib.request
shot_data_url_start="https://stats.nba.com/events/?flag=3&CFID=33&CFPARAMS=2017-18&PlayerID="
player_id="202695"
shot_data_url_end="&ContextMeasure=FGA&Season=2017-18§ion=player&sct=plot"
def shoy_chart(player_id):
full_url = shot_data_url_start + str(player_id) + shot_data_url_end
json = requests.get(full_url, headers=headers).json()
return(json)
data = json['resultSets'][0]['rowSets']
columns = json['resultSets'][0]['headers']
df = pd.DataFrame.from_records(data, columns=columns)
这是笔记本向我显示的错误:
TypeError Traceback (most recent call last)
<ipython-input-42-a3452c3a4fc8> in <module>
18
19
---> 20 data = json['resultSets'][0]['rowSets']
21 columns = json['resultSets'][0]['headers']
22
TypeError: 'module' object is not subscriptable
任何人都可以帮助我,或者知道另一种将数据导入 .csv 或 excel 文件的方法吗?
当使用 import json
导入时,名称 json
指的是 Python 标准库的 JSON 模块。您不能将它用作常规变量名。如果您将变量重命名为其他名称,例如 response_json
,您的这部分代码将起作用。
关于其余代码,页面 https://stats.nba.com/events/ doesn't return any JSON text, it is a regular web page with images, menus, a video player, etc... If you want to access the API that returns the shots in JSON format, you will have to use the https://stats.nba.com/stats/shotchartdetail(具有正确的查询字符串)。这个 API 端点在教程中提到,在 "Chrome XHR tab and resulting json linked by url" 图像中。
好的,我已经像这样更改了代码:
import requests
import json
import pandas as pd
from pandas import DataFrame as df
import urllib.request
def shot_chart(player_id):
full_url = "https://stats.nba.com/stats/shotchartdetail?AheadBehind=&CFID=33&CFPARAMS=2017-18&ClutchTime=&Conference=&ContextFilter=&ContextMeasure=FGA&DateFrom=&DateTo=&Division=&EndPeriod=10&EndRange=28800&GROUP_ID=&GameEventID=&GameID=&GameSegment=&GroupID=&GroupMode=&GroupQuantity=5&LastNGames=0&LeagueID=00&Location=&Month=0&OnOff=&OpponentTeamID=0&Outcome=&PORound=0&Period=0&PlayerID=202695&PlayerID1=&PlayerID2=&PlayerID3=&PlayerID4=&PlayerID5=&PlayerPosition=&PointDiff=&Position=&RangeType=0&RookieYear=&Season=2017-18&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StartPeriod=1&StartRange=0&StarterBench=&TeamID=0&VsConference=&VsDivision=&VsPlayerID1=&VsPlayerID2=&VsPlayerID3=&VsPlayerID4=&VsPlayerID5=&VsTeamID="
response_json = requests.get(full_url, headers=headers)
return(response_json)
data = response_json['resultSets'][0]['rowSets']
columns = response_json['resultSets'][0]['headers']
df = pd.DataFrame.from_records(data, columns=columns)
import requests
import json
import pandas as pd
from pandas import DataFrame as df
import urllib.request
shot_data_url_start="https://stats.nba.com/stats/shotchartdetail?AheadBehind=&CFID=33&CFPARAMS=2019-20&ClutchTime=&Conference=&ContextFilter=&ContextMeasure=FGA&DateFrom=&DateTo=&Division=&EndPeriod=10&EndRange=28800&GROUP_ID=&GameEventID=&GameID=&GameSegment=&GroupID=&GroupMode=&GroupQuantity=5&LastNGames=0&LeagueID=00&Location=&Month=0&OnOff=&OpponentTeamID=0&Outcome=&PORound=0&Period=0&PlayerID="
player_id="202330"
shot_data_url_end="&PlayerID1=&PlayerID2=&PlayerID3=&PlayerID4=&PlayerID5=&PlayerPosition=&PointDiff=&Position=&RangeType=0&RookieYear=&Season=2019-20&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StartPeriod=1&StartRange=0&StarterBench=&TeamID=0&VsConference=&VsDivision=&VsPlayerID1=&VsPlayerID2=&VsPlayerID3=&VsPlayerID4=&VsPlayerID5=&VsTeamID="
def shot_chart(player_id):
full_url = shot_data_url_start + str(player_id) + shot_data_url_end
response_json = requests.get(full_url).json()
return(response_json)
data = response_json['resultSets'][0]['rowSets']
columns = response_json['resultSets'][0]['headers']
df = pd.DataFrame.from_records(data, columns=columns)
shot_chart("202330")
现在怎么样了?笔记本藏起来就知道了
试试这个
import pandas as pd
from pandas import DataFrame as df
shot_data_url_start = "https://stats.nba.com/stats/shotchartdetail?AheadBehind=&CFID=33&CFPARAMS=2017-18&ClutchTime=&Conference=&ContextFilter=&ContextMeasure=FGA&DateFrom=&DateTo=&Division=&EndPeriod=10&EndRange=28800&GROUP_ID=&GameEventID=&GameID=&GameSegment=&GroupID=&GroupMode=&GroupQuantity=5&LastNGames=0&LeagueID=00&Location=&Month=0&OnOff=&OpponentTeamID=0&Outcome=&PORound=0&Period=0&PlayerID="
player_id = "204001"
shot_data_url_end = "&PlayerID1=&PlayerID2=&PlayerID3=&PlayerID4=&PlayerID5=&PlayerPosition=&PointDiff=&Position=&RangeType=0&RookieYear=&Season=2017-18&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StartPeriod=1&StartRange=0&StarterBench=&TeamID=0&VsConference=&VsDivision=&VsPlayerID1=&VsPlayerID2=&VsPlayerID3=&VsPlayerID4=&VsPlayerID5=&VsTeamID="
def get_shot_data(player_id):
full_url = shot_data_url_start + player_id + shot_data_url_end
data = requests.get(
full_url,
headers = {
"User-Agent": "PostmanRuntime/7.4.0"
}
)
return data.json()
shot_results = get_shot_data(player_id)
result_sets = shot_results['resultSets']
first_result_set = result_sets[0]
row_set = first_result_set['rowSet']
set_headers = first_result_set['headers']
df = pd.DataFrame.from_records(row_set, columns=set_headers)
我知道你是如何对这种媒介感到困惑的post。您错过了 headers
,而 NBA api 的 url 不正确。这就是@pierre 在他的回应中试图表达的意思。您使用的 url 不正确。如果您重读您关注的 post,您会发现作者说他必须深入研究开发工具才能找到实际的 url 以获取 [=21] =].
编辑:忘记提及当我没有在 headers
中传递 User-Agent
时,请求会超时。如果你不传递它,你将不会得到成功的响应。
我正在尝试从 NBA 统计页面获取一些统计数据。我正在关注本教程的想法 https://towardsdatascience.com/using-python-pandas-and-plotly-to-generate-nba-shot-charts-e28f873a99cb
基本思路是将数据放入csv文件。
所以我尝试使用这段代码,从 nba 网站获取数据,尝试获取 json 文件并将其转换为 csv:
import requests
import json
import pandas as pd
from pandas import DataFrame as df
import urllib.request
shot_data_url_start="https://stats.nba.com/events/?flag=3&CFID=33&CFPARAMS=2017-18&PlayerID="
player_id="202695"
shot_data_url_end="&ContextMeasure=FGA&Season=2017-18§ion=player&sct=plot"
def shoy_chart(player_id):
full_url = shot_data_url_start + str(player_id) + shot_data_url_end
json = requests.get(full_url, headers=headers).json()
return(json)
data = json['resultSets'][0]['rowSets']
columns = json['resultSets'][0]['headers']
df = pd.DataFrame.from_records(data, columns=columns)
这是笔记本向我显示的错误:
TypeError Traceback (most recent call last)
<ipython-input-42-a3452c3a4fc8> in <module>
18
19
---> 20 data = json['resultSets'][0]['rowSets']
21 columns = json['resultSets'][0]['headers']
22
TypeError: 'module' object is not subscriptable
任何人都可以帮助我,或者知道另一种将数据导入 .csv 或 excel 文件的方法吗?
当使用 import json
导入时,名称 json
指的是 Python 标准库的 JSON 模块。您不能将它用作常规变量名。如果您将变量重命名为其他名称,例如 response_json
,您的这部分代码将起作用。
关于其余代码,页面 https://stats.nba.com/events/ doesn't return any JSON text, it is a regular web page with images, menus, a video player, etc... If you want to access the API that returns the shots in JSON format, you will have to use the https://stats.nba.com/stats/shotchartdetail(具有正确的查询字符串)。这个 API 端点在教程中提到,在 "Chrome XHR tab and resulting json linked by url" 图像中。
好的,我已经像这样更改了代码:
import requests
import json
import pandas as pd
from pandas import DataFrame as df
import urllib.request
def shot_chart(player_id):
full_url = "https://stats.nba.com/stats/shotchartdetail?AheadBehind=&CFID=33&CFPARAMS=2017-18&ClutchTime=&Conference=&ContextFilter=&ContextMeasure=FGA&DateFrom=&DateTo=&Division=&EndPeriod=10&EndRange=28800&GROUP_ID=&GameEventID=&GameID=&GameSegment=&GroupID=&GroupMode=&GroupQuantity=5&LastNGames=0&LeagueID=00&Location=&Month=0&OnOff=&OpponentTeamID=0&Outcome=&PORound=0&Period=0&PlayerID=202695&PlayerID1=&PlayerID2=&PlayerID3=&PlayerID4=&PlayerID5=&PlayerPosition=&PointDiff=&Position=&RangeType=0&RookieYear=&Season=2017-18&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StartPeriod=1&StartRange=0&StarterBench=&TeamID=0&VsConference=&VsDivision=&VsPlayerID1=&VsPlayerID2=&VsPlayerID3=&VsPlayerID4=&VsPlayerID5=&VsTeamID="
response_json = requests.get(full_url, headers=headers)
return(response_json)
data = response_json['resultSets'][0]['rowSets']
columns = response_json['resultSets'][0]['headers']
df = pd.DataFrame.from_records(data, columns=columns)
import requests
import json
import pandas as pd
from pandas import DataFrame as df
import urllib.request
shot_data_url_start="https://stats.nba.com/stats/shotchartdetail?AheadBehind=&CFID=33&CFPARAMS=2019-20&ClutchTime=&Conference=&ContextFilter=&ContextMeasure=FGA&DateFrom=&DateTo=&Division=&EndPeriod=10&EndRange=28800&GROUP_ID=&GameEventID=&GameID=&GameSegment=&GroupID=&GroupMode=&GroupQuantity=5&LastNGames=0&LeagueID=00&Location=&Month=0&OnOff=&OpponentTeamID=0&Outcome=&PORound=0&Period=0&PlayerID="
player_id="202330"
shot_data_url_end="&PlayerID1=&PlayerID2=&PlayerID3=&PlayerID4=&PlayerID5=&PlayerPosition=&PointDiff=&Position=&RangeType=0&RookieYear=&Season=2019-20&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StartPeriod=1&StartRange=0&StarterBench=&TeamID=0&VsConference=&VsDivision=&VsPlayerID1=&VsPlayerID2=&VsPlayerID3=&VsPlayerID4=&VsPlayerID5=&VsTeamID="
def shot_chart(player_id):
full_url = shot_data_url_start + str(player_id) + shot_data_url_end
response_json = requests.get(full_url).json()
return(response_json)
data = response_json['resultSets'][0]['rowSets']
columns = response_json['resultSets'][0]['headers']
df = pd.DataFrame.from_records(data, columns=columns)
shot_chart("202330")
现在怎么样了?笔记本藏起来就知道了
试试这个
import pandas as pd
from pandas import DataFrame as df
shot_data_url_start = "https://stats.nba.com/stats/shotchartdetail?AheadBehind=&CFID=33&CFPARAMS=2017-18&ClutchTime=&Conference=&ContextFilter=&ContextMeasure=FGA&DateFrom=&DateTo=&Division=&EndPeriod=10&EndRange=28800&GROUP_ID=&GameEventID=&GameID=&GameSegment=&GroupID=&GroupMode=&GroupQuantity=5&LastNGames=0&LeagueID=00&Location=&Month=0&OnOff=&OpponentTeamID=0&Outcome=&PORound=0&Period=0&PlayerID="
player_id = "204001"
shot_data_url_end = "&PlayerID1=&PlayerID2=&PlayerID3=&PlayerID4=&PlayerID5=&PlayerPosition=&PointDiff=&Position=&RangeType=0&RookieYear=&Season=2017-18&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StartPeriod=1&StartRange=0&StarterBench=&TeamID=0&VsConference=&VsDivision=&VsPlayerID1=&VsPlayerID2=&VsPlayerID3=&VsPlayerID4=&VsPlayerID5=&VsTeamID="
def get_shot_data(player_id):
full_url = shot_data_url_start + player_id + shot_data_url_end
data = requests.get(
full_url,
headers = {
"User-Agent": "PostmanRuntime/7.4.0"
}
)
return data.json()
shot_results = get_shot_data(player_id)
result_sets = shot_results['resultSets']
first_result_set = result_sets[0]
row_set = first_result_set['rowSet']
set_headers = first_result_set['headers']
df = pd.DataFrame.from_records(row_set, columns=set_headers)
我知道你是如何对这种媒介感到困惑的post。您错过了 headers
,而 NBA api 的 url 不正确。这就是@pierre 在他的回应中试图表达的意思。您使用的 url 不正确。如果您重读您关注的 post,您会发现作者说他必须深入研究开发工具才能找到实际的 url 以获取 [=21] =].
编辑:忘记提及当我没有在 headers
中传递 User-Agent
时,请求会超时。如果你不传递它,你将不会得到成功的响应。