使用 multi-level 在 pandas 数据帧上融化

Melt on pandas dataframes with multi-level

我有一个数据 table 看起来像这样:

原版Table

使用 df = pd.read_excel('H:/test/test.xlsx', header=[0,1,2]) 将其导入 pandas 得到此数据帧:

    Day                 Cats                   Dogs              
    Unnamed: 0_level_1  Small       Big        Little     Large    
    Unnamed: 0_level_2  young  old  young old  young old  young old
0   1                   1      2    1     5    3     2    8     6
1   2                   2      4    3     6    0     1    4     6
2   3                   3      6    6     7    9     3    5     0

我想将数据框转换成如下所示:

    Day Count  Animal   Size    Age
0   1   1      Cats     Small   young
1   2   2      Cats     Small   young
2   3   3      Cats     Small   young
3   1   2      Cats     Small   old
4   2   4      Cats     Small   old
5   3   6      Cats     Small   old
6   1   1      Cats     Big     young
7   2   3      Cats     Big     young
8   3   6      Cats     Big     young
9   1   5      Cats     Big     old
10  2   6      Cats     Big     old
11  3   7      Cats     Big     old
12  1   3      Dogs     Little  young
13  2   0      Dogs     Little  young
14  3   9      Dogs     Little  young
15  1   2      Dogs     Little  old
16  2   1      Dogs     Little  old
17  3   3      Dogs     Little  old
18  1   8      Dogs     Large   young
19  2   4      Dogs     Large   young
20  3   5      Dogs     Large   young
21  1   6      Dogs     Large   old
22  2   6      Dogs     Large   old
23  3   0      Dogs     Large   old

我试过这个方法达到的效果:

df.columns = ['_'.join(col).strip() for col in df.columns.values]
df.rename(columns={'Day_Unnamed: 0_level_1_Unnamed: 0_level_2':'Day'}, inplace=True)
df = df.melt(id_vars=['Day'], var_name='INFO', value_name='Count')
df['INFO'] = df['INFO'].str.split('_')
df['Animal'] = df['INFO'].str[0]
df['Size'] = df['INFO'].str[1]
df['Age'] = df['INFO'].str[2]
df.drop(columns='INFO',inplace=True)

我所做的是将我的 multi-level headers 展平成一个普通的数据框,并使用 melt 取消数据框的透视。然后我将 df[INFO] 列分成特定的列 'Animal'、'Size'、'Age'.

此方法有效,但我想知道是否有任何方法可以实现此目的而不必将 table 展平为纯格式并拆分 'INFO' 列,因为我想要的数据集要处理的列远不止三列。

使用melt:

out = (
  df.set_index(df.columns[0])
    .melt(var_name=['Animal', 'Size', 'Age'], value_name='Count', ignore_index=False)
    .rename_axis('Day')
    .reset_index()
)

输出:

>>> out
    Day Animal    Size    Age  Count
0     1   Cats   Small  young      1
1     2   Cats   Small  young      2
2     3   Cats   Small  young      3
3     1   Cats   Small    old      2
4     2   Cats   Small    old      4
5     3   Cats   Small    old      6
6     1   Cats     Big  young      1
7     2   Cats     Big  young      3
8     3   Cats     Big  young      6
9     1   Cats     Big    old      5
10    2   Cats     Big    old      6
11    3   Cats     Big    old      7
12    1   Dogs  Little  young      3
13    2   Dogs  Little  young      0
14    3   Dogs  Little  young      9
15    1   Dogs  Little    old      2
16    2   Dogs  Little    old      1
17    3   Dogs  Little    old      3
18    1   Dogs   Large  young      8
19    2   Dogs   Large  young      4
20    3   Dogs   Large  young      5
21    1   Dogs   Large    old      6
22    2   Dogs   Large    old      6
23    3   Dogs   Large    old      0