从篮球参考中抓取数据,它没有循环完整 url

Scraping data from basketball reference and it is not looping through full url

代码只循环到 url 'https://www.basketball-reference.com/teams/{0}' 中的这一点,之后什么也没有,所以它在不正确的 url

上抓取了不正确的数据
team_abbrev = pd.read_csv(r'C:\Users\micha\OneDrive\Desktop\NBA\team_abbreviations.csv')



for i in team_abbrev:
    url = ('https://www.basketball-reference.com/teams/{0}/2022/gamelog-advanced/#tgl_advanced').format(i)

    team_perf = pd.read_html(url)[0]

您没有遍历 .csv 或 pd 数据框中的行。首先,您需要将 csv 加载到您的数据框中,然后您需要遍历该数据框:

def baskiceball():

    filename = 'C:/Users/Me/Desktop/teams.csv'
    df = pd.read_csv(filename)
    for index, row in df.iterrows():
        for x in range(0, len(row)):
            url = f'https://www.basketball-reference.com/teams/{row[x]}/2022/gamelog-advanced/#tgl_advanced'
            r = requests.get(url)
            data = r.status_code
            print(f"{row[x]}" + " | " + f"{data}")
baskiceball()

我的 teams.csv 文档在单列中包含团队缩写:

team_abbreviation
SAC
GSW 

您将 row[x] 插入查询字符串

您提出请求r = requests.get(url)

您已阅读请求。在这种情况下,我选择了 r.status_code,因为 url 没有 return json,我只是想证明它有效。结果:

SAC | 200
GSW | 200