Json csv 问题

Json problems to csv

我正在尝试从 NBA 统计页面获取一些统计数据。我正在关注本教程的想法 https://towardsdatascience.com/using-python-pandas-and-plotly-to-generate-nba-shot-charts-e28f873a99cb

基本思路是将数据放入csv文件。

所以我尝试使用这段代码,从 nba 网站获取数据,尝试获取 json 文件并将其转换为 csv:

import requests
import json
import pandas as pd
from pandas import DataFrame as df
import urllib.request



shot_data_url_start="https://stats.nba.com/events/?flag=3&CFID=33&CFPARAMS=2017-18&PlayerID="
player_id="202695"
shot_data_url_end="&ContextMeasure=FGA&Season=2017-18&section=player&sct=plot"

def shoy_chart(player_id):
   full_url = shot_data_url_start + str(player_id) + shot_data_url_end
   json = requests.get(full_url, headers=headers).json()
return(json)



data = json['resultSets'][0]['rowSets']
columns = json['resultSets'][0]['headers']


df = pd.DataFrame.from_records(data, columns=columns)

这是笔记本向我显示的错误:

TypeError                                 Traceback (most recent call last)
<ipython-input-42-a3452c3a4fc8> in <module>
 18 
 19 
 ---> 20 data = json['resultSets'][0]['rowSets']
 21 columns = json['resultSets'][0]['headers']
 22 

 TypeError: 'module' object is not subscriptable

任何人都可以帮助我,或者知道另一种将数据导入 .csv 或 excel 文件的方法吗?

当使用 import json 导入时,名称 json 指的是 Python 标准库的 JSON 模块。您不能将它用作常规变量名。如果您将变量重命名为其他名称,例如 response_json,您的这部分代码将起作用。

关于其余代码,页面 https://stats.nba.com/events/ doesn't return any JSON text, it is a regular web page with images, menus, a video player, etc... If you want to access the API that returns the shots in JSON format, you will have to use the https://stats.nba.com/stats/shotchartdetail(具有正确的查询字符串)。这个 API 端点在教程中提到,在 "Chrome XHR tab and resulting json linked by url" 图像中。

好的,我已经像这样更改了代码:

import requests
import json
import pandas as pd
from pandas import DataFrame as df
import urllib.request


def shot_chart(player_id):
full_url = "https://stats.nba.com/stats/shotchartdetail?AheadBehind=&CFID=33&CFPARAMS=2017-18&ClutchTime=&Conference=&ContextFilter=&ContextMeasure=FGA&DateFrom=&DateTo=&Division=&EndPeriod=10&EndRange=28800&GROUP_ID=&GameEventID=&GameID=&GameSegment=&GroupID=&GroupMode=&GroupQuantity=5&LastNGames=0&LeagueID=00&Location=&Month=0&OnOff=&OpponentTeamID=0&Outcome=&PORound=0&Period=0&PlayerID=202695&PlayerID1=&PlayerID2=&PlayerID3=&PlayerID4=&PlayerID5=&PlayerPosition=&PointDiff=&Position=&RangeType=0&RookieYear=&Season=2017-18&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StartPeriod=1&StartRange=0&StarterBench=&TeamID=0&VsConference=&VsDivision=&VsPlayerID1=&VsPlayerID2=&VsPlayerID3=&VsPlayerID4=&VsPlayerID5=&VsTeamID="
    response_json = requests.get(full_url, headers=headers)
    return(response_json)



    data = response_json['resultSets'][0]['rowSets']
    columns = response_json['resultSets'][0]['headers']


    df = pd.DataFrame.from_records(data, columns=columns)
import requests
import json
import pandas as pd
from pandas import DataFrame as df
import urllib.request



shot_data_url_start="https://stats.nba.com/stats/shotchartdetail?AheadBehind=&CFID=33&CFPARAMS=2019-20&ClutchTime=&Conference=&ContextFilter=&ContextMeasure=FGA&DateFrom=&DateTo=&Division=&EndPeriod=10&EndRange=28800&GROUP_ID=&GameEventID=&GameID=&GameSegment=&GroupID=&GroupMode=&GroupQuantity=5&LastNGames=0&LeagueID=00&Location=&Month=0&OnOff=&OpponentTeamID=0&Outcome=&PORound=0&Period=0&PlayerID="
player_id="202330"
shot_data_url_end="&PlayerID1=&PlayerID2=&PlayerID3=&PlayerID4=&PlayerID5=&PlayerPosition=&PointDiff=&Position=&RangeType=0&RookieYear=&Season=2019-20&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StartPeriod=1&StartRange=0&StarterBench=&TeamID=0&VsConference=&VsDivision=&VsPlayerID1=&VsPlayerID2=&VsPlayerID3=&VsPlayerID4=&VsPlayerID5=&VsTeamID="

def shot_chart(player_id):
    full_url = shot_data_url_start + str(player_id) + shot_data_url_end
    response_json = requests.get(full_url).json()
    return(response_json)



    data = response_json['resultSets'][0]['rowSets']
    columns = response_json['resultSets'][0]['headers']


    df = pd.DataFrame.from_records(data, columns=columns)

shot_chart("202330")

现在怎么样了?笔记本藏起来就知道了

试试这个

import pandas as pd
from pandas import DataFrame as df

shot_data_url_start = "https://stats.nba.com/stats/shotchartdetail?AheadBehind=&CFID=33&CFPARAMS=2017-18&ClutchTime=&Conference=&ContextFilter=&ContextMeasure=FGA&DateFrom=&DateTo=&Division=&EndPeriod=10&EndRange=28800&GROUP_ID=&GameEventID=&GameID=&GameSegment=&GroupID=&GroupMode=&GroupQuantity=5&LastNGames=0&LeagueID=00&Location=&Month=0&OnOff=&OpponentTeamID=0&Outcome=&PORound=0&Period=0&PlayerID="
player_id = "204001"
shot_data_url_end = "&PlayerID1=&PlayerID2=&PlayerID3=&PlayerID4=&PlayerID5=&PlayerPosition=&PointDiff=&Position=&RangeType=0&RookieYear=&Season=2017-18&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StartPeriod=1&StartRange=0&StarterBench=&TeamID=0&VsConference=&VsDivision=&VsPlayerID1=&VsPlayerID2=&VsPlayerID3=&VsPlayerID4=&VsPlayerID5=&VsTeamID="

def get_shot_data(player_id):
    full_url = shot_data_url_start + player_id + shot_data_url_end
    data = requests.get(
        full_url,
        headers = {
            "User-Agent": "PostmanRuntime/7.4.0"
        }
    )
    return data.json()

shot_results = get_shot_data(player_id)

result_sets = shot_results['resultSets']

first_result_set = result_sets[0]
row_set = first_result_set['rowSet']
set_headers = first_result_set['headers']

df = pd.DataFrame.from_records(row_set, columns=set_headers)

我知道你是如何对这种媒介感到困惑的post。您错过了 headers,而 NBA api 的 url 不正确。这就是@pierre 在他的回应中试图表达的意思。您使用的 url 不正确。如果您重读您关注的 post,您会发现作者说他必须深入研究开发工具才能找到实际的 url 以获取 [=21] =].

编辑:忘记提及当我没有在 headers 中传递 User-Agent 时,请求会超时。如果你不传递它,你将不会得到成功的响应。