Pandas 使用多索引开始和日期重新采样
Pandas resample with multiindex start- and enddate
假设我有一个具有两个索引级别的多索引 Pandas 数据框:month_begin 和 month_end
import pandas as pd
multi_index = pd.MultiIndex.from_tuples([("2022-03-01", "2022-03-31"),
("2022-04-01", "2022-04-30"),
("2022-05-01", "2022-05-31"),
("2022-06-01", "2022-06-30")])
multi_index.names = ['month_begin', 'month_end']
df = pd.DataFrame(np.random.rand(4,100), index=multi_index)
df
0 1 ... 98 99
month_begin month_end ...
2022-03-01 2022-03-31 0.322032 0.205307 ... 0.975128 0.673460
2022-04-01 2022-04-30 0.113813 0.278981 ... 0.951049 0.090765
2022-05-01 2022-05-31 0.777918 0.842734 ... 0.667831 0.274189
2022-06-01 2022-06-30 0.221407 0.555711 ... 0.745158 0.648246
我想做的是每天对数据重新采样,以便每月显示一个月中的每一天的值:
0 1 ... 98 99
...
2022-03-01 0.322032 0.205307 ... 0.975128 0.673460
2022-03-02 0.322032 0.205307 ... 0.975128 0.673460
2022-03-03 0.322032 0.205307 ... 0.975128 0.673460
...
2022-06-29 0.221407 0.555711 ... 0.745158 0.648246
2022-06-30 0.221407 0.555711 ... 0.745158 0.648246
我试图从第一个索引中检索月份
df.index.get_level_values('month_begin'),
但后来我不知道如何判断它应该每天重新采样。有人知道怎么做这个吗?非常感谢您的帮助。
使用:
#convert MultiIndex to columns
df1 = df.reset_index()
#if necessary convert to datetimes
df1['month_begin'] = pd.to_datetime(df1['month_begin'])
df1['month_end'] = pd.to_datetime(df1['month_end'])
#repeat indices by difference between end and start in days
df1 = (df1.loc[df1.index.repeat(df1['month_end'].sub(df1['month_begin']).dt.days + 1)]
.drop('month_end',1))
#creat ecounter converted to days timedeltas
s = pd.to_timedelta(df1.groupby(level=0).cumcount(), unit='d')
#add timedeltas to month_begin and convert to DatetimeIndex
df1 = (df1.assign(month_begin = df1['month_begin'].add(s))
.set_index('month_begin')
.rename_axis(None))
前 3 列的输出:
print (df1.iloc[:, :3])
0 1 2
2022-03-01 0.009359 0.499058 0.113384
2022-03-02 0.009359 0.499058 0.113384
2022-03-03 0.009359 0.499058 0.113384
2022-03-04 0.009359 0.499058 0.113384
2022-03-05 0.009359 0.499058 0.113384
... ... ...
2022-06-26 0.460350 0.241094 0.821568
2022-06-27 0.460350 0.241094 0.821568
2022-06-28 0.460350 0.241094 0.821568
2022-06-29 0.460350 0.241094 0.821568
2022-06-30 0.460350 0.241094 0.821568
[122 rows x 3 columns]
假设我有一个具有两个索引级别的多索引 Pandas 数据框:month_begin 和 month_end
import pandas as pd
multi_index = pd.MultiIndex.from_tuples([("2022-03-01", "2022-03-31"),
("2022-04-01", "2022-04-30"),
("2022-05-01", "2022-05-31"),
("2022-06-01", "2022-06-30")])
multi_index.names = ['month_begin', 'month_end']
df = pd.DataFrame(np.random.rand(4,100), index=multi_index)
df
0 1 ... 98 99
month_begin month_end ...
2022-03-01 2022-03-31 0.322032 0.205307 ... 0.975128 0.673460
2022-04-01 2022-04-30 0.113813 0.278981 ... 0.951049 0.090765
2022-05-01 2022-05-31 0.777918 0.842734 ... 0.667831 0.274189
2022-06-01 2022-06-30 0.221407 0.555711 ... 0.745158 0.648246
我想做的是每天对数据重新采样,以便每月显示一个月中的每一天的值:
0 1 ... 98 99
...
2022-03-01 0.322032 0.205307 ... 0.975128 0.673460
2022-03-02 0.322032 0.205307 ... 0.975128 0.673460
2022-03-03 0.322032 0.205307 ... 0.975128 0.673460
...
2022-06-29 0.221407 0.555711 ... 0.745158 0.648246
2022-06-30 0.221407 0.555711 ... 0.745158 0.648246
我试图从第一个索引中检索月份
df.index.get_level_values('month_begin'),
但后来我不知道如何判断它应该每天重新采样。有人知道怎么做这个吗?非常感谢您的帮助。
使用:
#convert MultiIndex to columns
df1 = df.reset_index()
#if necessary convert to datetimes
df1['month_begin'] = pd.to_datetime(df1['month_begin'])
df1['month_end'] = pd.to_datetime(df1['month_end'])
#repeat indices by difference between end and start in days
df1 = (df1.loc[df1.index.repeat(df1['month_end'].sub(df1['month_begin']).dt.days + 1)]
.drop('month_end',1))
#creat ecounter converted to days timedeltas
s = pd.to_timedelta(df1.groupby(level=0).cumcount(), unit='d')
#add timedeltas to month_begin and convert to DatetimeIndex
df1 = (df1.assign(month_begin = df1['month_begin'].add(s))
.set_index('month_begin')
.rename_axis(None))
前 3 列的输出:
print (df1.iloc[:, :3])
0 1 2
2022-03-01 0.009359 0.499058 0.113384
2022-03-02 0.009359 0.499058 0.113384
2022-03-03 0.009359 0.499058 0.113384
2022-03-04 0.009359 0.499058 0.113384
2022-03-05 0.009359 0.499058 0.113384
... ... ...
2022-06-26 0.460350 0.241094 0.821568
2022-06-27 0.460350 0.241094 0.821568
2022-06-28 0.460350 0.241094 0.821568
2022-06-29 0.460350 0.241094 0.821568
2022-06-30 0.460350 0.241094 0.821568
[122 rows x 3 columns]