如何合并 pandas pivot table 中的列?
How to combine columns in pandas pivot table?
我有一个具有 3 层列的枢轴 table。对于每个独特的均值和标准差,我想将它们组合成一个 str f"{x.mean}({x.std})"
用新的 mean_std_str 列替换均值和标准差列。
这是数据框的打印:
rescore_func asp chemscore ... goldscore plp
tag best first best ... first best first
mean std mean std mean ... std mean std mean std
dock_func ...
asp 65.2 0.7 34.5 2.4 64.0 ... 0.0 64.4 0.7 37.9 0.7
chemscore 59.1 2.0 29.5 2.0 58.0 ... 1.7 58.7 0.7 40.9 2.3
goldscore 68.9 1.7 34.8 4.3 69.7 ... 1.3 68.9 1.3 46.2 0.7
plp 69.3 1.1 35.2 2.0 69.7 ... 2.0 68.9 2.4 39.4 2.9
[4 rows x 16 columns]
期望的输出:
rescore_func asp
tag best first ...
mean_std mean_std ...
dock_func ...
asp 65.2(0.7) 34.5(2.4) ...
chemscore 59.1(2.0) 29.5(2.0) ...
goldscore 68.9(1.7) 34.8(4.3) ...
plp 69.3(1.1) 35.2(2.0) ...
[4 rows x 16 columns]
到目前为止我有:
df = df.melt(ignore_index=False).reset_index()
df = df.rename(columns=str).rename(columns={'None':'descr'})
给出:
dock_func rescore_func tag descr value
0 asp asp best mean 65.2
1 chemscore asp best mean 59.1
2 goldscore asp best mean 68.9
3 plp asp best mean 69.3
4 asp asp best std 0.7
.. ... ... ... ... ...
59 plp plp first mean 39.4
60 asp plp first std 0.7
61 chemscore plp first std 2.3
62 goldscore plp first std 0.7
63 plp plp first std 2.9
[64 rows x 5 columns]
我对如何在重新旋转数据之前将均值和标准组合在一起感到困惑...
DataFrame.reorder_levels
会让你轻松一点。
这是一些示例数据:
import numpy as np
import pandas as pd
index = pd.Index(["asp", "chemscore", "goldscore", "plp"], name="dock_func")
columns = pd.MultiIndex.from_product(
[index, pd.Index(["best", "fisrt"], name="tag"), ("mean", "std")]
)
df = pd.DataFrame(
np.random.random(size=(4, 16)),
index=index,
columns=columns,
).round(1)
df
看起来像:
dock_func asp chemscore goldscore plp
tag best fisrt best fisrt best fisrt best fisrt
mean std mean std mean std mean std mean std mean std mean std mean std
dock_func
asp 0.5 0.6 0.4 0.2 0.7 0.7 0.8 0.1 0.2 0.5 0.6 0.7 0.5 0.2 0.2 0.7
chemscore 0.0 0.7 0.9 0.2 0.3 0.3 0.4 0.8 0.3 0.4 0.2 0.8 0.5 0.5 0.4 0.2
goldscore 0.5 0.7 0.8 0.0 0.2 0.8 0.1 0.2 0.6 0.1 0.4 0.2 0.8 0.2 0.8 0.3
plp 1.0 0.6 0.6 0.8 0.8 0.6 0.3 1.0 0.7 0.2 0.8 0.2 0.2 0.2 0.7 0.2
然后 运行 以下内容:
df = df.reorder_levels([2, 0, 1], axis=1).astype(str)
df = df["mean"] + "(" + df["std"] + ")"
和df
是:
dock_func asp chemscore goldscore plp
tag best fisrt best fisrt best fisrt best fisrt
dock_func
asp 0.5(0.6) 0.4(0.2) 0.7(0.7) 0.8(0.1) 0.2(0.5) 0.6(0.7) 0.5(0.2) 0.2(0.7)
chemscore 0.0(0.7) 0.9(0.2) 0.3(0.3) 0.4(0.8) 0.3(0.4) 0.2(0.8) 0.5(0.5) 0.4(0.2)
goldscore 0.5(0.7) 0.8(0.0) 0.2(0.8) 0.1(0.2) 0.6(0.1) 0.4(0.2) 0.8(0.2) 0.8(0.3)
plp 1.0(0.6) 0.6(0.8) 0.8(0.6) 0.3(1.0) 0.7(0.2) 0.8(0.2) 0.2(0.2) 0.7(0.2)
我有一个具有 3 层列的枢轴 table。对于每个独特的均值和标准差,我想将它们组合成一个 str f"{x.mean}({x.std})"
用新的 mean_std_str 列替换均值和标准差列。
这是数据框的打印:
rescore_func asp chemscore ... goldscore plp
tag best first best ... first best first
mean std mean std mean ... std mean std mean std
dock_func ...
asp 65.2 0.7 34.5 2.4 64.0 ... 0.0 64.4 0.7 37.9 0.7
chemscore 59.1 2.0 29.5 2.0 58.0 ... 1.7 58.7 0.7 40.9 2.3
goldscore 68.9 1.7 34.8 4.3 69.7 ... 1.3 68.9 1.3 46.2 0.7
plp 69.3 1.1 35.2 2.0 69.7 ... 2.0 68.9 2.4 39.4 2.9
[4 rows x 16 columns]
期望的输出:
rescore_func asp
tag best first ...
mean_std mean_std ...
dock_func ...
asp 65.2(0.7) 34.5(2.4) ...
chemscore 59.1(2.0) 29.5(2.0) ...
goldscore 68.9(1.7) 34.8(4.3) ...
plp 69.3(1.1) 35.2(2.0) ...
[4 rows x 16 columns]
到目前为止我有:
df = df.melt(ignore_index=False).reset_index()
df = df.rename(columns=str).rename(columns={'None':'descr'})
给出:
dock_func rescore_func tag descr value
0 asp asp best mean 65.2
1 chemscore asp best mean 59.1
2 goldscore asp best mean 68.9
3 plp asp best mean 69.3
4 asp asp best std 0.7
.. ... ... ... ... ...
59 plp plp first mean 39.4
60 asp plp first std 0.7
61 chemscore plp first std 2.3
62 goldscore plp first std 0.7
63 plp plp first std 2.9
[64 rows x 5 columns]
我对如何在重新旋转数据之前将均值和标准组合在一起感到困惑...
DataFrame.reorder_levels
会让你轻松一点。
这是一些示例数据:
import numpy as np
import pandas as pd
index = pd.Index(["asp", "chemscore", "goldscore", "plp"], name="dock_func")
columns = pd.MultiIndex.from_product(
[index, pd.Index(["best", "fisrt"], name="tag"), ("mean", "std")]
)
df = pd.DataFrame(
np.random.random(size=(4, 16)),
index=index,
columns=columns,
).round(1)
df
看起来像:
dock_func asp chemscore goldscore plp
tag best fisrt best fisrt best fisrt best fisrt
mean std mean std mean std mean std mean std mean std mean std mean std
dock_func
asp 0.5 0.6 0.4 0.2 0.7 0.7 0.8 0.1 0.2 0.5 0.6 0.7 0.5 0.2 0.2 0.7
chemscore 0.0 0.7 0.9 0.2 0.3 0.3 0.4 0.8 0.3 0.4 0.2 0.8 0.5 0.5 0.4 0.2
goldscore 0.5 0.7 0.8 0.0 0.2 0.8 0.1 0.2 0.6 0.1 0.4 0.2 0.8 0.2 0.8 0.3
plp 1.0 0.6 0.6 0.8 0.8 0.6 0.3 1.0 0.7 0.2 0.8 0.2 0.2 0.2 0.7 0.2
然后 运行 以下内容:
df = df.reorder_levels([2, 0, 1], axis=1).astype(str)
df = df["mean"] + "(" + df["std"] + ")"
和df
是:
dock_func asp chemscore goldscore plp
tag best fisrt best fisrt best fisrt best fisrt
dock_func
asp 0.5(0.6) 0.4(0.2) 0.7(0.7) 0.8(0.1) 0.2(0.5) 0.6(0.7) 0.5(0.2) 0.2(0.7)
chemscore 0.0(0.7) 0.9(0.2) 0.3(0.3) 0.4(0.8) 0.3(0.4) 0.2(0.8) 0.5(0.5) 0.4(0.2)
goldscore 0.5(0.7) 0.8(0.0) 0.2(0.8) 0.1(0.2) 0.6(0.1) 0.4(0.2) 0.8(0.2) 0.8(0.3)
plp 1.0(0.6) 0.6(0.8) 0.8(0.6) 0.3(1.0) 0.7(0.2) 0.8(0.2) 0.2(0.2) 0.7(0.2)