如何合并 pandas pivot table 中的列?

How to combine columns in pandas pivot table?

我有一个具有 3 层列的枢轴 table。对于每个独特的均值和标准差,我想将它们组合成一个 str f"{x.mean}({x.std})" 用新的 mean_std_str 列替换均值和标准差列。

这是数据框的打印:

rescore_func   asp                 chemscore  ... goldscore   plp                
tag           best      first           best  ...     first  best      first     
              mean  std  mean  std      mean  ...       std  mean  std  mean  std
dock_func                                     ...                                
asp           65.2  0.7  34.5  2.4      64.0  ...       0.0  64.4  0.7  37.9  0.7
chemscore     59.1  2.0  29.5  2.0      58.0  ...       1.7  58.7  0.7  40.9  2.3
goldscore     68.9  1.7  34.8  4.3      69.7  ...       1.3  68.9  1.3  46.2  0.7
plp           69.3  1.1  35.2  2.0      69.7  ...       2.0  68.9  2.4  39.4  2.9

[4 rows x 16 columns]

期望的输出:

rescore_func   asp                             
tag            best       first           ...  
               mean_std   mean_std        ...
dock_func                                 ...                                 
asp            65.2(0.7)  34.5(2.4)       ...
chemscore      59.1(2.0)  29.5(2.0)       ... 
goldscore      68.9(1.7)  34.8(4.3)       ...
plp            69.3(1.1)  35.2(2.0)       ...

[4 rows x 16 columns]

到目前为止我有:

df = df.melt(ignore_index=False).reset_index()
df = df.rename(columns=str).rename(columns={'None':'descr'})

给出:

    dock_func rescore_func    tag descr  value
0         asp          asp   best  mean   65.2
1   chemscore          asp   best  mean   59.1
2   goldscore          asp   best  mean   68.9
3         plp          asp   best  mean   69.3
4         asp          asp   best   std    0.7
..        ...          ...    ...   ...    ...
59        plp          plp  first  mean   39.4
60        asp          plp  first   std    0.7
61  chemscore          plp  first   std    2.3
62  goldscore          plp  first   std    0.7
63        plp          plp  first   std    2.9

[64 rows x 5 columns]

我对如何在重新旋转数据之前将均值和标准组合在一起感到困惑...

DataFrame.reorder_levels 会让你轻松一点。

这是一些示例数据:

import numpy as np
import pandas as pd


index = pd.Index(["asp", "chemscore", "goldscore", "plp"], name="dock_func")
columns = pd.MultiIndex.from_product(
    [index, pd.Index(["best", "fisrt"], name="tag"), ("mean", "std")]
)

df = pd.DataFrame(
    np.random.random(size=(4, 16)),
    index=index,
    columns=columns,
).round(1)

df 看起来像:

dock_func  asp                 chemscore                 goldscore                  plp
tag       best      fisrt           best      fisrt           best      fisrt      best      fisrt
          mean  std  mean  std      mean  std  mean  std      mean  std  mean  std mean  std  mean  std
dock_func
asp        0.5  0.6   0.4  0.2       0.7  0.7   0.8  0.1       0.2  0.5   0.6  0.7  0.5  0.2   0.2  0.7
chemscore  0.0  0.7   0.9  0.2       0.3  0.3   0.4  0.8       0.3  0.4   0.2  0.8  0.5  0.5   0.4  0.2
goldscore  0.5  0.7   0.8  0.0       0.2  0.8   0.1  0.2       0.6  0.1   0.4  0.2  0.8  0.2   0.8  0.3
plp        1.0  0.6   0.6  0.8       0.8  0.6   0.3  1.0       0.7  0.2   0.8  0.2  0.2  0.2   0.7  0.2

然后 运行 以下内容:

df = df.reorder_levels([2, 0, 1], axis=1).astype(str)
df = df["mean"] + "(" + df["std"] + ")"

df是:

dock_func       asp           chemscore           goldscore                 plp
tag            best     fisrt      best     fisrt      best     fisrt      best     fisrt
dock_func
asp        0.5(0.6)  0.4(0.2)  0.7(0.7)  0.8(0.1)  0.2(0.5)  0.6(0.7)  0.5(0.2)  0.2(0.7)
chemscore  0.0(0.7)  0.9(0.2)  0.3(0.3)  0.4(0.8)  0.3(0.4)  0.2(0.8)  0.5(0.5)  0.4(0.2)
goldscore  0.5(0.7)  0.8(0.0)  0.2(0.8)  0.1(0.2)  0.6(0.1)  0.4(0.2)  0.8(0.2)  0.8(0.3)
plp        1.0(0.6)  0.6(0.8)  0.8(0.6)  0.3(1.0)  0.7(0.2)  0.8(0.2)  0.2(0.2)  0.7(0.2)