Python 多索引 - 如何在仅将时间作为索引的数据框中创建分层多索引?
Python Multindex - How can I create a hierarchical multindex in a dataframe that has time only as indexes?
假设我有一个看起来像这样的 Dataframe df
9-2021 8-2021 7-2021
Datetime
13:00:00 0.000 0.000 0.0000
13:05:00 -0.003 -0.005 0.0010
13:10:00 -0.009 -0.005 -0.0020
...
23:50:00 0.004 -0.001 0.006
23:55:00 0.006 -0.008 -.006
00:00:00 0.005 -0.001 -.003
00:05:00 0.004 -0.002 -0.008
00:10:00 -0.010 0.006 -0.001
00:15:00 0.008 0.003 -0.001
...
23:50:00 -0.001 0.005 0.009
23:55:00 0.006 -0.008 -.006
00:00:00 0.005 -0.001 -.003
00:05:00 0.004 -0.002 -0.008
我想将我的索引转换为多索引,以便创建更高级别的细分,以指定我每次越过午夜时都移动到后一天。所以它应该看起来像这样,在接下来的几天里依此类推。有什么想法吗?
9-2021 8-2021 7-2021
Day Datetime
D 13:00:00 0.000 0.000 0.0000
13:05:00 -0.003 -0.005 0.0010
13:10:00 -0.009 -0.005 -0.0020
...
23:50:00 0.004 -0.001 0.006
23:55:00 0.006 -0.008 -.006
D+1 00:00:00 0.005 -0.001 -.003
00:05:00 0.004 -0.002 -0.008
00:10:00 -0.010 0.006 -0.001
00:15:00 0.008 0.003 -0.001
...
23:50:00 -0.001 0.005 0.009
23:55:00 0.006 -0.008 -.006
D+2 00:00:00 0.005 -0.001 -.003
00:05:00 0.004 -0.002 -0.008
这应该适合你。您可以尝试编号,但从零开始最简单。您会在完成示例时看到。
data=''' 9-2021 8-2021 7-2021
Datetime
13:00:00 0.000 0.000 0.0000
13:05:00 -0.003 -0.005 0.0010
13:10:00 -0.009 -0.005 -0.0020
23:50:00 0.004 -0.001 0.006
23:55:00 0.006 -0.008 -.006
00:00:00 0.005 -0.001 -.003
00:05:00 0.004 -0.002 -0.008
00:10:00 -0.010 0.006 -0.001
00:15:00 0.008 0.003 -0.001
23:50:00 -0.001 0.005 0.009
23:55:00 0.006 -0.008 -.006
00:00:00 0.005 -0.001 -.003
00:05:00 0.004 -0.002 -0.008 '''
import io
df = pd.read_csv(io.StringIO(data), sep=' \s+', engine='python')
# use cumcount to count values in each group
df['day_counter'] = df.groupby(df.index').cumcount()+1
# set data to np.nan for further processing
df.loc[df.index!='00:00:00', 'day_counter'] = np.nan
# forward fill
df['day_counter'] = df['day_counter'].fillna(method="ffill")
df['day_counter'] = df['day_counter'].fillna(0)
df['day_counter'] = df['day_counter'].fillna(0).astype('int')
# set multiIndex
df.set_index(['day_counter', df.index], inplace=True)
df
9-2021 8-2021 7-2021
day_counter Datetime
0 13:00:00 0.000 0.000 0.000
13:05:00 -0.003 -0.005 0.001
13:10:00 -0.009 -0.005 -0.002
23:50:00 0.004 -0.001 0.006
23:55:00 0.006 -0.008 -0.006
1 00:00:00 0.005 -0.001 -0.003
00:05:00 0.004 -0.002 -0.008
00:10:00 -0.010 0.006 -0.001
00:15:00 0.008 0.003 -0.001
23:50:00 -0.001 0.005 0.009
23:55:00 0.006 -0.008 -0.006
2 00:00:00 0.005 -0.001 -0.003
00:05:00 0.004 -0.002 -0.008
假设我有一个看起来像这样的 Dataframe df
9-2021 8-2021 7-2021
Datetime
13:00:00 0.000 0.000 0.0000
13:05:00 -0.003 -0.005 0.0010
13:10:00 -0.009 -0.005 -0.0020
...
23:50:00 0.004 -0.001 0.006
23:55:00 0.006 -0.008 -.006
00:00:00 0.005 -0.001 -.003
00:05:00 0.004 -0.002 -0.008
00:10:00 -0.010 0.006 -0.001
00:15:00 0.008 0.003 -0.001
...
23:50:00 -0.001 0.005 0.009
23:55:00 0.006 -0.008 -.006
00:00:00 0.005 -0.001 -.003
00:05:00 0.004 -0.002 -0.008
我想将我的索引转换为多索引,以便创建更高级别的细分,以指定我每次越过午夜时都移动到后一天。所以它应该看起来像这样,在接下来的几天里依此类推。有什么想法吗?
9-2021 8-2021 7-2021
Day Datetime
D 13:00:00 0.000 0.000 0.0000
13:05:00 -0.003 -0.005 0.0010
13:10:00 -0.009 -0.005 -0.0020
...
23:50:00 0.004 -0.001 0.006
23:55:00 0.006 -0.008 -.006
D+1 00:00:00 0.005 -0.001 -.003
00:05:00 0.004 -0.002 -0.008
00:10:00 -0.010 0.006 -0.001
00:15:00 0.008 0.003 -0.001
...
23:50:00 -0.001 0.005 0.009
23:55:00 0.006 -0.008 -.006
D+2 00:00:00 0.005 -0.001 -.003
00:05:00 0.004 -0.002 -0.008
这应该适合你。您可以尝试编号,但从零开始最简单。您会在完成示例时看到。
data=''' 9-2021 8-2021 7-2021
Datetime
13:00:00 0.000 0.000 0.0000
13:05:00 -0.003 -0.005 0.0010
13:10:00 -0.009 -0.005 -0.0020
23:50:00 0.004 -0.001 0.006
23:55:00 0.006 -0.008 -.006
00:00:00 0.005 -0.001 -.003
00:05:00 0.004 -0.002 -0.008
00:10:00 -0.010 0.006 -0.001
00:15:00 0.008 0.003 -0.001
23:50:00 -0.001 0.005 0.009
23:55:00 0.006 -0.008 -.006
00:00:00 0.005 -0.001 -.003
00:05:00 0.004 -0.002 -0.008 '''
import io
df = pd.read_csv(io.StringIO(data), sep=' \s+', engine='python')
# use cumcount to count values in each group
df['day_counter'] = df.groupby(df.index').cumcount()+1
# set data to np.nan for further processing
df.loc[df.index!='00:00:00', 'day_counter'] = np.nan
# forward fill
df['day_counter'] = df['day_counter'].fillna(method="ffill")
df['day_counter'] = df['day_counter'].fillna(0)
df['day_counter'] = df['day_counter'].fillna(0).astype('int')
# set multiIndex
df.set_index(['day_counter', df.index], inplace=True)
df
9-2021 8-2021 7-2021
day_counter Datetime
0 13:00:00 0.000 0.000 0.000
13:05:00 -0.003 -0.005 0.001
13:10:00 -0.009 -0.005 -0.002
23:50:00 0.004 -0.001 0.006
23:55:00 0.006 -0.008 -0.006
1 00:00:00 0.005 -0.001 -0.003
00:05:00 0.004 -0.002 -0.008
00:10:00 -0.010 0.006 -0.001
00:15:00 0.008 0.003 -0.001
23:50:00 -0.001 0.005 0.009
23:55:00 0.006 -0.008 -0.006
2 00:00:00 0.005 -0.001 -0.003
00:05:00 0.004 -0.002 -0.008