遍历 Python 中缺少日期的日期范围

Question

这里我得到了一个 pandas 数据框，其中包含每天 return 的股票，列是日期和 return 汇率。但是如果我只想保留每个星期的最后一天，而数据中有一些缺失的日子，我该怎么办？

import pandas as pd

df = pd.read_csv('Daily_return.csv')
df.Date = pd.to_datetime(db.Date)
count = 300
for last_day in ('2017-01-01' + 7n for n in range(count)):

实际上我的大脑在这一点上停止工作，我的想象力有限......也许最大的一点是“+7n”之类的东西没有意义，缺少一些日期。

Answer 1

我将创建一个包含 40 个日期和 40 个样本 returns 的样本数据集，然后随机抽取其中的 90% 来模拟缺失的日期。

此处的关键是您需要将 date 列转换为日期时间（如果尚未转换），并确保您的 df 按日期排序。

然后你可以 groupby year/week 并取最后一个值。如果您运行重复此操作，您会发现如果删除的值是一周的最后一天，则所选日期可能会发生变化。

基于此

import pandas as pd
import numpy as np

df = pd.DataFrame()
df['date'] = pd.date_range(start='04-18-2022',periods=40, freq='D')
df['return'] = np.random.uniform(size=40)

# Keep 90 percent of the records so we can see what happens when some days are missing
df = df.sample(frac=.9)

# In case your dates are actually strings
df['date'] = pd.to_datetime(df['date'])

# Make sure they are sorted from oldest to newest
df = df.sort_values(by='date')

df = df.groupby([df['date'].dt.isocalendar().year,
                 df['date'].dt.isocalendar().week], as_index=False).last()

print(df)

输出

       date    return
0 2022-04-24  0.299958
1 2022-05-01  0.248471
2 2022-05-08  0.506919
3 2022-05-15  0.541929
4 2022-05-22  0.588768
5 2022-05-27  0.504419

遍历 Python 中缺少日期的日期范围

Iterating through a range of dates in Python with missing dates

python

loops

missing-data