循环遍历数据范围以从 API 下载数据
Looping through a data range to download data from an API
我正在从 API 下载数据,最长回溯期为 833 天,但根据我的测试,我知道他们的数据可以追溯到 2002 年。我在下面有一个函数定义了 833 天今天进入两个日期时间“结束”和“开始”,这些被输入 API 命令。请注意,它们需要采用字符串格式并以这种方式格式化,api 才能接受它们。
d=datetime.today()
end = str(d.year) + "-" + str(d.month) + "-" + str(d.day)
lookbook_period = 833
# Take current date and minus the the max time delta of 833 days to get 'start' var
time_delta = timedelta(days=lookbook_period)
now = datetime.now()
#split the answer and format it to the required timestamp type.
start = str(now - time_delta).split(" ")[0]
我想要做的是下载 833 天部分的数据帧,然后将它们拼凑成一个 CSV 或数据帧。我目前所拥有的低于我所拥有的,但我不确定如何制作一个函数来改变日期。
def time_machine():
df_total = pd.DataFrame
start_str = str(2002) + "-0" + str(5) + "-0" + str(1)
start = datetime(2002,5,1)
print(start)
# amount of days from 2002-05-01 to now
rolling_td = timedelta(days=int(str((datetime.today() - start)).split(" ")[0]))
print(rolling_td, "\n")
# API maximum amount of lookbook days
max_td = timedelta(days=833)
# The function would do something similar to this, and on each pass, calling the API and saving the data to a dataframe or CSV.
s1 = start + max_td
print(s1)
s2 = s1 + max_td
print(s2)
s3 = s2 + max_td
print(s3)
d=datetime.today()
end = str(d.year) + "-" + str(d.month) + "-" + str(d.day)
print(d)
任何建议或 tools/libraries 将不胜感激。我一直在用 while 循环测试一些东西,但我仍然 运行 在这方面盲目地陷入黑暗。
这是我认为我需要的粗略 sudo 代码,但我仍然不确定如何进入下一节
while count > 0 and > 833:
start =
end =
call the API first to download first set of data.
Check date range:
get most recent date + 833 days to it
Download next section
repeat
if count < 833:
calulate requied dates for start and end
如果您首先定义日期范围,您将能够使用 API 遍历每个 833 天的时间段来提取数据。然后,您需要为每次迭代将数据附加到数据框(或 csv)。
import datetime as dt
# Date range to pull data over
start_date = dt.date(2002,5,1)
end_date = dt.date.today()
delta = dt.timedelta(days=832) # 832 so you have a range of 833 days inclusive
# Iterating from start date, recording date ranges of 833 days
date_ranges = []
temp_start_date = start_date
while temp_start_date < end_date:
temp_end_date = temp_start_date + delta
if temp_end_date > end_date:
temp_end_date = end_date
date_ranges.append([temp_start_date, temp_end_date])
temp_start_date = temp_end_date + dt.timedelta(days=1)
# For each date range, pass dates into API
# Initialise dataframe here
for start_date, end_date in date_ranges:
start_date_str = start_date.strftime("%Y-%m-%d")
end_date_str = end_date.strftime("%Y-%m-%d")
# Input to API with start and end dates in correct string format
# Process data into dataframe
应该不需要计算 833 天,正如您所说 API 将开始和结束日期作为参数,因此您只需要为每个日期范围找到它们。
您可以使用 pandas date_range 每 833 天获取一次日期,然后循环访问。另外,您的日期格式看起来是等格式的。
将您的代码放入 for 循环中以将数据放入数据帧,然后输出到 csv。
dates_list = pd.date_range(start=datetime.now(), periods=10, freq='-833D')
for i in range(1, len(dates_list)):
# print(dates_list[i], dates_list[i-1])
start = dates_list[i].date().isoformat()
end = dates_list[i-1].date().isoformat()
print(start, end)
2019-07-15 2021-10-25
2017-04-03 2019-07-15
2014-12-22 2017-04-03
2012-09-10 2014-12-22
2010-05-31 2012-09-10
2008-02-18 2010-05-31
2005-11-07 2008-02-18
2003-07-28 2005-11-07
2001-04-16 2003-07-28
我正在从 API 下载数据,最长回溯期为 833 天,但根据我的测试,我知道他们的数据可以追溯到 2002 年。我在下面有一个函数定义了 833 天今天进入两个日期时间“结束”和“开始”,这些被输入 API 命令。请注意,它们需要采用字符串格式并以这种方式格式化,api 才能接受它们。
d=datetime.today()
end = str(d.year) + "-" + str(d.month) + "-" + str(d.day)
lookbook_period = 833
# Take current date and minus the the max time delta of 833 days to get 'start' var
time_delta = timedelta(days=lookbook_period)
now = datetime.now()
#split the answer and format it to the required timestamp type.
start = str(now - time_delta).split(" ")[0]
我想要做的是下载 833 天部分的数据帧,然后将它们拼凑成一个 CSV 或数据帧。我目前所拥有的低于我所拥有的,但我不确定如何制作一个函数来改变日期。
def time_machine():
df_total = pd.DataFrame
start_str = str(2002) + "-0" + str(5) + "-0" + str(1)
start = datetime(2002,5,1)
print(start)
# amount of days from 2002-05-01 to now
rolling_td = timedelta(days=int(str((datetime.today() - start)).split(" ")[0]))
print(rolling_td, "\n")
# API maximum amount of lookbook days
max_td = timedelta(days=833)
# The function would do something similar to this, and on each pass, calling the API and saving the data to a dataframe or CSV.
s1 = start + max_td
print(s1)
s2 = s1 + max_td
print(s2)
s3 = s2 + max_td
print(s3)
d=datetime.today()
end = str(d.year) + "-" + str(d.month) + "-" + str(d.day)
print(d)
任何建议或 tools/libraries 将不胜感激。我一直在用 while 循环测试一些东西,但我仍然 运行 在这方面盲目地陷入黑暗。
这是我认为我需要的粗略 sudo 代码,但我仍然不确定如何进入下一节
while count > 0 and > 833:
start =
end =
call the API first to download first set of data.
Check date range:
get most recent date + 833 days to it
Download next section
repeat
if count < 833:
calulate requied dates for start and end
如果您首先定义日期范围,您将能够使用 API 遍历每个 833 天的时间段来提取数据。然后,您需要为每次迭代将数据附加到数据框(或 csv)。
import datetime as dt
# Date range to pull data over
start_date = dt.date(2002,5,1)
end_date = dt.date.today()
delta = dt.timedelta(days=832) # 832 so you have a range of 833 days inclusive
# Iterating from start date, recording date ranges of 833 days
date_ranges = []
temp_start_date = start_date
while temp_start_date < end_date:
temp_end_date = temp_start_date + delta
if temp_end_date > end_date:
temp_end_date = end_date
date_ranges.append([temp_start_date, temp_end_date])
temp_start_date = temp_end_date + dt.timedelta(days=1)
# For each date range, pass dates into API
# Initialise dataframe here
for start_date, end_date in date_ranges:
start_date_str = start_date.strftime("%Y-%m-%d")
end_date_str = end_date.strftime("%Y-%m-%d")
# Input to API with start and end dates in correct string format
# Process data into dataframe
应该不需要计算 833 天,正如您所说 API 将开始和结束日期作为参数,因此您只需要为每个日期范围找到它们。
您可以使用 pandas date_range 每 833 天获取一次日期,然后循环访问。另外,您的日期格式看起来是等格式的。
将您的代码放入 for 循环中以将数据放入数据帧,然后输出到 csv。
dates_list = pd.date_range(start=datetime.now(), periods=10, freq='-833D')
for i in range(1, len(dates_list)):
# print(dates_list[i], dates_list[i-1])
start = dates_list[i].date().isoformat()
end = dates_list[i-1].date().isoformat()
print(start, end)
2019-07-15 2021-10-25
2017-04-03 2019-07-15
2014-12-22 2017-04-03
2012-09-10 2014-12-22
2010-05-31 2012-09-10
2008-02-18 2010-05-31
2005-11-07 2008-02-18
2003-07-28 2005-11-07
2001-04-16 2003-07-28