在现有数据框中插入一个级别 o,以便将 4 列分组为一个

Insert a level o in the existing data frame such that 4 columns are grouped as one

我想为我的数据框做多重索引,这样 MAE、MSE、RMSE、MPE 被组合在一起并被赋予一个新的索引级别。同样,其余四个应分组在同一级别但名称不同

> mux3 = pd.MultiIndex.from_product([list('ABCD'),list('1234')],
> names=['one','two'])###dummy data 
>     df3 = pd.DataFrame(np.random.choice(10, (3, len(mux))), columns=mux3) #### dummy data frame
>     print(df3) #intended output required for the data frame in the picture given below

假设列组已经按照适当的顺序排列,我们可以简单地创建一个 np.arange over the length of the columns and floor divide by 4 to get groups and create a simple MultiIndex.from_arrays.

示例输入和输出:

import numpy as np
import pandas as pd

initial_index = [1, 2, 3, 4] * 3
np.random.seed(5)
df3 = pd.DataFrame(
    np.random.choice(10, (3, len(initial_index))), columns=initial_index
)

   1  2  3  4  1  2  3  4  1  2  3  4  # Column headers are in repeating order
0  3  6  6  0  9  8  4  7  0  0  7  1
1  5  7  0  1  4  6  2  9  9  9  9  1
2  2  7  0  5  0  0  4  4  9  3  2  4
# Create New Columns
df3.columns = pd.MultiIndex.from_arrays([
    np.arange(len(df3.columns)) // 4,  # Group Each set of 4 columns together
    df3.columns  # Keep level 1 the same as current columns
], names=['one', 'two'])  # Set Names (optional)
df3

one  0           1           2         
two  1  2  3  4  1  2  3  4  1  2  3  4
0    3  6  6  0  9  8  4  7  0  0  7  1
1    5  7  0  1  4  6  2  9  9  9  9  1
2    2  7  0  5  0  0  4  4  9  3  2  4

如果列的顺序是混合的:

np.random.seed(5)
df3 = pd.DataFrame(
    np.random.choice(10, (3, 8)), columns=[1, 1, 3, 2, 4, 3, 2, 4]
)
df3

   1  1  3  2  4  3  2  4  # Cannot select groups positionally
0  3  6  6  0  9  8  4  7
1  0  0  7  1  5  7  0  1
2  4  6  2  9  9  9  9  1

如果需要,我们可以转换 Index.to_series then enumerate columns using groupby cumcount then sort_index 以获得顺序:

df3.columns = pd.MultiIndex.from_arrays([
    # Enumerate Groups to create new level 0 index
    df3.columns.to_series().groupby(df3.columns).cumcount(),
    df3.columns
], names=['one', 'two'])  # Set Names (optional)
# Sort to Order Correctly
# (Do not sort before setting columns it will break alignment with data)
df3 = df3.sort_index(axis=1)
df3

one  0           1         
two  1  2  3  4  1  2  3  4  # Notice Data has moved with headers
0    3  0  6  9  6  4  8  7
1    0  1  7  5  0  0  7  1
2    4  9  2  9  6  9  9  1