如何使用 python 中的 pandas 在数据框中填充日期

Question

我有一个包含两列的数据框，Case 和 Date。这里 Date 实际上是开始日期。我想将其填充为一个时间序列，比如为每个案例添加三个 (month_num) 个日期并删除原始日期。

原始数据框：

   Case       Date
0     1 2010-01-01
1     2 2011-04-01
2     3 2012-08-01

填充日期后：

   Case        Date
0     1  2010-02-01
1     1  2010-03-01
2     1  2010-04-01
3     2  2011-05-01
4     2  2011-06-01
5     2  2011-07-01
6     3  2012-09-01
7     3  2012-10-01
8     3  2012-11-01

我试图声明一个具有相同列名和数据类型的空数据框，并使用 for 循环遍历 Case 和 month_num，并将行添加到新的数据框。

import pandas as pd

data = [[1, '2010-01-01'],  [2, '2011-04-01'], [3, '2012-08-01']]
 
df = pd.DataFrame(data, columns = ['Case', 'Date'])

df.Date = pd.to_datetime(df.Date)

df_new = pd.DataFrame(columns=df.columns)
df_new['Case'] = pd.to_numeric(df_new['Case'])
df_new['Date'] = pd.to_datetime(df_new['Date'])

month_num = 3

for c in df.Case:
    for m in range(1, month_num+1):
        temp = df.loc[df['Case']==c]
        temp['Date'] = temp['Date'] + pd.DateOffset(months=m)
        df_new = pd.concat([df_new, temp]) 

df_new.reset_index(inplace=True, drop=True)

我的代码可以工作，但是，当原始数据帧和 month_num 变大时，运行花费了大量时间。有没有更好的方法来做我需要的？非常感谢！！

Answer 1

您的性能问题可能与在内部 for 循环中使用 pd.concat 有关。这解释了原因。

如答案所示，您可能希望使用外部列表来收集您在 for 循环中创建的所有数据帧，然后连接一次列表。

Answer 2

根据你的输入数据，这就是我笔记本上的工作方式：

df2=pd.DataFrame()

df2['Date']=df['Date'].apply(lambda x: pd.date_range(start=x, periods=3,freq='M')).explode()

df3=pd.merge_asof(df2,df,on='Date')
df3['Date']=df3['Date']+ pd.DateOffset(days=1)
df3[['Case','Date']]

我们创建一个 df2，我们用来自原始 df
然后 df3 在 df2 和 df 之间产生 merge_asof（填充 'Case' 列）
最后，我们将结果列偏移 1 天

如何使用 python 中的 pandas 在数据框中填充日期

How to populate date in a dataframe using pandas in python

python

time-series

pandas