Python 多索引 - 如何在仅将时间作为索引的数据框中创建分层多索引?

Python Multindex - How can I create a hierarchical multindex in a dataframe that has time only as indexes?

假设我有一个看起来像这样的 Dataframe df

       9-2021   8-2021  7-2021  
Datetime                                                        
13:00:00    0.000   0.000   0.0000   
13:05:00    -0.003  -0.005  0.0010     
13:10:00    -0.009  -0.005  -0.0020 
  
...
            
23:50:00   0.004   -0.001  0.006    
23:55:00    0.006  -0.008  -.006   
00:00:00    0.005   -0.001  -.003    
00:05:00    0.004  -0.002 -0.008    
00:10:00   -0.010   0.006  -0.001   
00:15:00   0.008  0.003  -0.001

...   

23:50:00  -0.001  0.005  0.009        
23:55:00    0.006  -0.008  -.006        
00:00:00    0.005   -0.001  -.003          
00:05:00    0.004  -0.002 -0.008 

我想将我的索引转换为多索引,以便创建更高级别的细分,以指定我每次越过午夜时都移动到后一天。所以它应该看起来像这样,在接下来的几天里依此类推。有什么想法吗?

            9-2021  8-2021  7-2021
Day   Datetime  
                                                    
D     13:00:00  0.000   0.000   0.0000   
      13:05:00  -0.003  -0.005  0.0010     
      13:10:00  -0.009  -0.005  -0.0020   
      ...            
      23:50:00   0.004   -0.001  0.006    
      23:55:00    0.006  -0.008  -.006   
D+1   00:00:00    0.005   -0.001  -.003    
      00:05:00    0.004  -0.002 -0.008    
      00:10:00   -0.010   0.006  -0.001   
      00:15:00   0.008  0.003  -0.001
      ...                                   

      23:50:00  -0.001  0.005  0.009        
      23:55:00    0.006  -0.008  -.006 
       
D+2   00:00:00    0.005   -0.001  -.003          
      00:05:00    0.004  -0.002 -0.008 

这应该适合你。您可以尝试编号,但从零开始最简单。您会在完成示例时看到。

data='''       9-2021   8-2021  7-2021
Datetime
13:00:00    0.000   0.000   0.0000
13:05:00    -0.003  -0.005  0.0010
13:10:00    -0.009  -0.005  -0.0020
23:50:00   0.004   -0.001  0.006
23:55:00    0.006  -0.008  -.006
00:00:00    0.005   -0.001  -.003
00:05:00    0.004  -0.002  -0.008
00:10:00   -0.010   0.006  -0.001
00:15:00   0.008  0.003  -0.001
23:50:00  -0.001  0.005  0.009
23:55:00    0.006  -0.008  -.006
00:00:00    0.005   -0.001  -.003
00:05:00    0.004  -0.002  -0.008 '''

import io
df = pd.read_csv(io.StringIO(data), sep=' \s+', engine='python')

# use cumcount to count values in each group
df['day_counter'] = df.groupby(df.index').cumcount()+1

# set data to np.nan for further processing
df.loc[df.index!='00:00:00', 'day_counter'] = np.nan

# forward fill
df['day_counter'] = df['day_counter'].fillna(method="ffill")
df['day_counter'] = df['day_counter'].fillna(0)
df['day_counter'] = df['day_counter'].fillna(0).astype('int')

# set multiIndex
df.set_index(['day_counter', df.index], inplace=True)

df

                      9-2021  8-2021  7-2021
day_counter Datetime
0           13:00:00   0.000   0.000   0.000
            13:05:00  -0.003  -0.005   0.001
            13:10:00  -0.009  -0.005  -0.002
            23:50:00   0.004  -0.001   0.006
            23:55:00   0.006  -0.008  -0.006
1           00:00:00   0.005  -0.001  -0.003
            00:05:00   0.004  -0.002  -0.008
            00:10:00  -0.010   0.006  -0.001
            00:15:00   0.008   0.003  -0.001
            23:50:00  -0.001   0.005   0.009
            23:55:00   0.006  -0.008  -0.006
2           00:00:00   0.005  -0.001  -0.003
            00:05:00   0.004  -0.002  -0.008