如何仅通过 Python 请求下载更新的 CSV 文件?
How to only download updated CSV files via Python requests?
我正在尝试从以下位置下载 CSV 文件:
Download sports fixtures, schedules and results as CSV, XLSX, ICS and JSON.
我有一个 python 程序可以下载我要查找的文件。问题是正在下载的文件不是最新的。目前是 5 月,其中一个下载的文件一直到 11 月就已经过时,而有些实际上是最新的。
我看不出有任何一致性,而且我 运行 没有关于如何修复它的想法。我试图触摸所有涉及的文件和文件夹以获取最新的时间戳。我试图清除所有 .pyc
文件。似乎没有任何效果。这是我使用的代码:
base_url = 'https://fixturedownload.com/download/'
csv_file_names = [
'epl-2021-chelsea-EasternStandardTime.csv' ,
'champions-league-2021-chelsea-EasternStandardTime.csv',
'la-liga-2021-fc-barcelona-EasternStandardTime.csv',
'champions-league-2021-barcelona-EasternStandardTime.csv',
'ligue-1-2021-paris-saint-germain-EasternStandardTime.csv',
'champions-league-2021-paris-EasternStandardTime.csv',
'epl-2021-EasternStandardTime.csv',
'champions-league-2021-EasternStandardTime.csv',
'mlb-2021-baltimore-orioles-EasternStandardTime.csv',
'nfl-2020-pittsburgh-steelers-EasternStandardTime.csv'
]
count = 0
led_count = 0
for csv in csv_file_names:
print("Downloading...", count+1, "of", len(csv_file_names), "-", csv )
r = requests.get( base_url+csv, allow_redirects=True)
open( '/home/pi/Score-Checker/CSV-Files/'+csv, 'wb').write(r.content)
count += 1
您是否尝试过提取 json 提要而不是下载 csv?需要对您的 csv_file_names 列表稍作更改。 (如果你需要我,我们可以处理你的原始列表,只需使用正则表达式来获取相关部分以放置在 url 中)。
import requests
import pandas as pd
import re
csv_file_names = [
['epl-2021','chelsea'] ,
['champions-league-2021','chelsea'],
['la-liga-2021','fc-barcelona'],
['champions-league-2021','barcelona'],
['ligue-1-2021','paris-saint-germain'],
['champions-league-2021','paris'],
['epl-2021', ''],
['champions-league-2021',''],
['mlb-2021','baltimore-orioles'],
['nfl-2020','pittsburgh-steelers']
]
for count, each in enumerate(csv_file_names, start=1):
url = 'https://fixturedownload.com/feed/json/%s/%s' %(each[0], each[-1])
jsonData = requests.get(url).json()
df = pd.DataFrame(jsonData)
csv = '%s-%s-.csv' %(each[0], each[-1])
print("Downloading...", count, "of", len(csv_file_names), "-", csv )
for col in ['HomeTeamScore', 'AwayTeamScore']:
df[col] = df[col].fillna(99).astype(int).astype(str)
df['Result'] = df['HomeTeamScore'] + ' - ' + df['AwayTeamScore']
df['Result'] = df['Result'].replace('99 - 99', '')
for col in df.columns:
if 'Date' in col:
newColName = 'Date'
else:
newColName = ' '.join(re.sub('([A-Z][a-z]+)', r' ', re.sub('([A-Z]+)', r' ', col)).split())
df = df.rename(columns={col:newColName})
df = df.drop(['Group', 'Home Team Score', 'Away Team Score'], axis=1)
df.to_csv('/home/pi/Score-Checker/CSV-Files/'+csv, index=False)
输出:'ligue-1-2021','paris-saint-germain'
我正在尝试从以下位置下载 CSV 文件:
Download sports fixtures, schedules and results as CSV, XLSX, ICS and JSON.
我有一个 python 程序可以下载我要查找的文件。问题是正在下载的文件不是最新的。目前是 5 月,其中一个下载的文件一直到 11 月就已经过时,而有些实际上是最新的。
我看不出有任何一致性,而且我 运行 没有关于如何修复它的想法。我试图触摸所有涉及的文件和文件夹以获取最新的时间戳。我试图清除所有 .pyc
文件。似乎没有任何效果。这是我使用的代码:
base_url = 'https://fixturedownload.com/download/'
csv_file_names = [
'epl-2021-chelsea-EasternStandardTime.csv' ,
'champions-league-2021-chelsea-EasternStandardTime.csv',
'la-liga-2021-fc-barcelona-EasternStandardTime.csv',
'champions-league-2021-barcelona-EasternStandardTime.csv',
'ligue-1-2021-paris-saint-germain-EasternStandardTime.csv',
'champions-league-2021-paris-EasternStandardTime.csv',
'epl-2021-EasternStandardTime.csv',
'champions-league-2021-EasternStandardTime.csv',
'mlb-2021-baltimore-orioles-EasternStandardTime.csv',
'nfl-2020-pittsburgh-steelers-EasternStandardTime.csv'
]
count = 0
led_count = 0
for csv in csv_file_names:
print("Downloading...", count+1, "of", len(csv_file_names), "-", csv )
r = requests.get( base_url+csv, allow_redirects=True)
open( '/home/pi/Score-Checker/CSV-Files/'+csv, 'wb').write(r.content)
count += 1
您是否尝试过提取 json 提要而不是下载 csv?需要对您的 csv_file_names 列表稍作更改。 (如果你需要我,我们可以处理你的原始列表,只需使用正则表达式来获取相关部分以放置在 url 中)。
import requests
import pandas as pd
import re
csv_file_names = [
['epl-2021','chelsea'] ,
['champions-league-2021','chelsea'],
['la-liga-2021','fc-barcelona'],
['champions-league-2021','barcelona'],
['ligue-1-2021','paris-saint-germain'],
['champions-league-2021','paris'],
['epl-2021', ''],
['champions-league-2021',''],
['mlb-2021','baltimore-orioles'],
['nfl-2020','pittsburgh-steelers']
]
for count, each in enumerate(csv_file_names, start=1):
url = 'https://fixturedownload.com/feed/json/%s/%s' %(each[0], each[-1])
jsonData = requests.get(url).json()
df = pd.DataFrame(jsonData)
csv = '%s-%s-.csv' %(each[0], each[-1])
print("Downloading...", count, "of", len(csv_file_names), "-", csv )
for col in ['HomeTeamScore', 'AwayTeamScore']:
df[col] = df[col].fillna(99).astype(int).astype(str)
df['Result'] = df['HomeTeamScore'] + ' - ' + df['AwayTeamScore']
df['Result'] = df['Result'].replace('99 - 99', '')
for col in df.columns:
if 'Date' in col:
newColName = 'Date'
else:
newColName = ' '.join(re.sub('([A-Z][a-z]+)', r' ', re.sub('([A-Z]+)', r' ', col)).split())
df = df.rename(columns={col:newColName})
df = df.drop(['Group', 'Home Team Score', 'Away Team Score'], axis=1)
df.to_csv('/home/pi/Score-Checker/CSV-Files/'+csv, index=False)
输出:'ligue-1-2021','paris-saint-germain'