循环遍历数据范围以从 API 下载数据

Looping through a data range to download data from an API

我正在从 API 下载数据,最长回溯期为 833 天,但根据我的测试,我知道他们的数据可以追溯到 2002 年。我在下面有一个函数定义了 833 天今天进入两个日期时间“结束”和“开始”,这些被输入 API 命令。请注意,它们需要采用字符串格式并以这种方式格式化,api 才能接受它们。

d=datetime.today()
end = str(d.year) + "-" + str(d.month) + "-" + str(d.day)

lookbook_period = 833

# Take current date and minus the the max time delta of 833 days to get 'start' var
time_delta = timedelta(days=lookbook_period)
now = datetime.now()

#split the answer and format it to the required timestamp type.
start = str(now - time_delta).split(" ")[0]

我想要做的是下载 833 天部分的数据帧,然后将它们拼凑成一个 CSV 或数据帧。我目前所拥有的低于我所拥有的,但我不确定如何制作一个函数来改变日期。


def time_machine():
    df_total = pd.DataFrame
    
    start_str = str(2002) + "-0" + str(5) + "-0" + str(1)
    start = datetime(2002,5,1)
    print(start)
    
    # amount of days from 2002-05-01 to now
    rolling_td = timedelta(days=int(str((datetime.today() - start)).split(" ")[0]))
    print(rolling_td, "\n")
    
    # API maximum amount of lookbook days
    max_td = timedelta(days=833)
    
    # The function would do something similar to this, and on each pass, calling the API and saving the data to a dataframe or CSV.


    s1 = start + max_td
    print(s1)
    s2 = s1 + max_td
    print(s2)
    s3 = s2 + max_td
    print(s3)
    
    d=datetime.today()
    end = str(d.year) + "-" + str(d.month) + "-" + str(d.day)
    print(d)

任何建议或 tools/libraries 将不胜感激。我一直在用 while 循环测试一些东西,但我仍然 运行 在这方面盲目地陷入黑暗。

这是我认为我需要的粗略 sudo 代码,但我仍然不确定如何进入下一节

while count > 0 and > 833:
        start = 
        end =
        
        
    call the API first to download first set of data. 
    Check date range:
        get most recent date + 833 days to it 
            Download next section
                repeat
    
    if count < 833: 
            calulate requied dates for start and end 

如果您首先定义日期范围,您将能够使用 API 遍历每个 833 天的时间段来提取数据。然后,您需要为每次迭代将数据附加到数据框(或 csv)。

import datetime as dt

# Date range to pull data over
start_date = dt.date(2002,5,1)
end_date = dt.date.today()
delta = dt.timedelta(days=832) # 832 so you have a range of 833 days inclusive

# Iterating from start date, recording date ranges of 833 days
date_ranges = []
temp_start_date = start_date
while temp_start_date < end_date:
    temp_end_date = temp_start_date + delta 
    if temp_end_date > end_date:
        temp_end_date = end_date
    date_ranges.append([temp_start_date, temp_end_date])
    temp_start_date = temp_end_date + dt.timedelta(days=1)

# For each date range, pass dates into API
# Initialise dataframe here
for start_date, end_date in date_ranges:
    start_date_str = start_date.strftime("%Y-%m-%d")
    end_date_str = end_date.strftime("%Y-%m-%d")

    # Input to API with start and end dates in correct string format

    # Process data into dataframe

应该不需要计算 833 天,正如您所说 API 将开始和结束日期作为参数,因此您只需要为每个日期范围找到它们。

您可以使用 pandas date_range 每 833 天获取一次日期,然后循环访问。另外,您的日期格式看起来是等格式的。

将您的代码放入 for 循环中以将数据放入数据帧,然后输出到 csv。

dates_list = pd.date_range(start=datetime.now(), periods=10, freq='-833D')

for i in range(1, len(dates_list)):
    # print(dates_list[i], dates_list[i-1])
    start = dates_list[i].date().isoformat()
    end = dates_list[i-1].date().isoformat()
    print(start, end)

2019-07-15 2021-10-25
2017-04-03 2019-07-15
2014-12-22 2017-04-03
2012-09-10 2014-12-22
2010-05-31 2012-09-10
2008-02-18 2010-05-31
2005-11-07 2008-02-18
2003-07-28 2005-11-07
2001-04-16 2003-07-28