pandas 保留不同索引数据帧上的值

pandas retain values on different index dataframes

我需要以不同的频率(每天到每周)合并两个数据帧。但是,希望在合并到每日数据框时保留每周值。

数据中存在分组变量,group

import pandas as pd
import datetime
from dateutil.relativedelta import relativedelta

daily={'date':[datetime.date(2022,1,1)+relativedelta(day=i) for i in range(1,10)]*2,
       'group':['A' for x in range(1,10)]+['B' for x in range(1,10)],
       'daily_value':[x for x in range(1,10)]*2}
weekly={'date':[datetime.date(2022,1,1),datetime.date(2022,1,7)]*2,
        'group':['A','A']+['B','B'],
        'weekly_value':[100,200,300,400]}


daily_data=pd.DataFrame(daily)
weekly_data=pd.DataFrame(weekly)

daily_data 输出:

          date group  daily_value
0   2022-01-01     A            1
1   2022-01-02     A            2
2   2022-01-03     A            3
3   2022-01-04     A            4
4   2022-01-05     A            5
5   2022-01-06     A            6
6   2022-01-07     A            7
7   2022-01-08     A            8
8   2022-01-09     A            9
9   2022-01-01     B            1
10  2022-01-02     B            2
11  2022-01-03     B            3
12  2022-01-04     B            4
13  2022-01-05     B            5
14  2022-01-06     B            6
15  2022-01-07     B            7
16  2022-01-08     B            8
17  2022-01-09     B            9

weekly_data 输出:

         date group  weekly_value
0  2022-01-01     A           100
1  2022-01-07     A           200
2  2022-01-01     B           300
3  2022-01-07     B           400

期望的输出

desired={'date':[datetime.date(2022,1,1)+relativedelta(day=i) for i in range(1,10)]*2,
         'group':['A' for x in range(1,10)]+['B' for x in range(1,10)],
         'daily_value':[x for x in range(1,10)]*2,
         'weekly_value':[100]*6+[200]*3+[300]*6+[400]*3}

desired_data=pd.DataFrame(desired)

desired_data 输出:

          date group  daily_value  weekly_value
0   2022-01-01     A            1           100
1   2022-01-02     A            2           100
2   2022-01-03     A            3           100
3   2022-01-04     A            4           100
4   2022-01-05     A            5           100
5   2022-01-06     A            6           100
6   2022-01-07     A            7           200
7   2022-01-08     A            8           200
8   2022-01-09     A            9           200
9   2022-01-01     B            1           300
10  2022-01-02     B            2           300
11  2022-01-03     B            3           300
12  2022-01-04     B            4           300
13  2022-01-05     B            5           300
14  2022-01-06     B            6           300
15  2022-01-07     B            7           400
16  2022-01-08     B            8           400
17  2022-01-09     B            9           400

使用 merge_asof 按日期时间对值进行排序,最后按两列进行排序:

daily_data['date'] = pd.to_datetime(daily_data['date'])
weekly_data['date'] = pd.to_datetime(weekly_data['date'])


df = (pd.merge_asof(daily_data.sort_values('date'),
                    weekly_data.sort_values('date'), 
                    on='date', 
                    by='group').sort_values(['group','date'], ignore_index=True))
print (df)
         date group  daily_value  weekly_value
0  2022-01-01     A            1           100
1  2022-01-02     A            2           100
2  2022-01-03     A            3           100
3  2022-01-04     A            4           100
4  2022-01-05     A            5           100
5  2022-01-06     A            6           100
6  2022-01-07     A            7           200
7  2022-01-08     A            8           200
8  2022-01-09     A            9           200
9  2022-01-01     B            1           300
10 2022-01-02     B            2           300
11 2022-01-03     B            3           300
12 2022-01-04     B            4           300
13 2022-01-05     B            5           300
14 2022-01-06     B            6           300
15 2022-01-07     B            7           400
16 2022-01-08     B            8           400
17 2022-01-09     B            9           400