从篮球参考中抓取数据,它没有循环完整 url
Scraping data from basketball reference and it is not looping through full url
代码只循环到 url 'https://www.basketball-reference.com/teams/{0}' 中的这一点,之后什么也没有,所以它在不正确的 url
上抓取了不正确的数据
team_abbrev = pd.read_csv(r'C:\Users\micha\OneDrive\Desktop\NBA\team_abbreviations.csv')
for i in team_abbrev:
url = ('https://www.basketball-reference.com/teams/{0}/2022/gamelog-advanced/#tgl_advanced').format(i)
team_perf = pd.read_html(url)[0]
您没有遍历 .csv 或 pd 数据框中的行。首先,您需要将 csv 加载到您的数据框中,然后您需要遍历该数据框:
def baskiceball():
filename = 'C:/Users/Me/Desktop/teams.csv'
df = pd.read_csv(filename)
for index, row in df.iterrows():
for x in range(0, len(row)):
url = f'https://www.basketball-reference.com/teams/{row[x]}/2022/gamelog-advanced/#tgl_advanced'
r = requests.get(url)
data = r.status_code
print(f"{row[x]}" + " | " + f"{data}")
baskiceball()
我的 teams.csv 文档在单列中包含团队缩写:
team_abbreviation
SAC
GSW
您将 row[x]
插入查询字符串
您提出请求r = requests.get(url)
您已阅读请求。在这种情况下,我选择了 r.status_code
,因为 url 没有 return json,我只是想证明它有效。结果:
SAC | 200
GSW | 200
代码只循环到 url 'https://www.basketball-reference.com/teams/{0}' 中的这一点,之后什么也没有,所以它在不正确的 url
上抓取了不正确的数据team_abbrev = pd.read_csv(r'C:\Users\micha\OneDrive\Desktop\NBA\team_abbreviations.csv')
for i in team_abbrev:
url = ('https://www.basketball-reference.com/teams/{0}/2022/gamelog-advanced/#tgl_advanced').format(i)
team_perf = pd.read_html(url)[0]
您没有遍历 .csv 或 pd 数据框中的行。首先,您需要将 csv 加载到您的数据框中,然后您需要遍历该数据框:
def baskiceball():
filename = 'C:/Users/Me/Desktop/teams.csv'
df = pd.read_csv(filename)
for index, row in df.iterrows():
for x in range(0, len(row)):
url = f'https://www.basketball-reference.com/teams/{row[x]}/2022/gamelog-advanced/#tgl_advanced'
r = requests.get(url)
data = r.status_code
print(f"{row[x]}" + " | " + f"{data}")
baskiceball()
我的 teams.csv 文档在单列中包含团队缩写:
team_abbreviation
SAC
GSW
您将 row[x]
插入查询字符串
您提出请求r = requests.get(url)
您已阅读请求。在这种情况下,我选择了 r.status_code
,因为 url 没有 return json,我只是想证明它有效。结果:
SAC | 200
GSW | 200