如何根据一列对数据框进行分组并根据另一列进行转置

Question

我有一个数据框，其值为：

col_1	Timestamp	data_1	data_2
aaa	22/12/2001	0.21	0.2
abb	22/12/2001	0.20	0
acc	22/12/2001	0.12	0.19
aaa	23/12/2001	0.23	0.21
abb	23/12/2001	0.32	0.18
acc	23/12/2001	0.52	0.20

我需要根据 时间戳 对数据帧进行分组，并添加列 w.r.t col_1 列对于 data_1 和 data_2 例如：

Timestamp	aaa_data_1	abb_data_1	acc_data_1	aaa_data_2	abb_data_2	acc_data_2
22/12/2001	0.21	0.20	0.12	0.2	0	0.19
23/12/2001	0.23	0.32	0.52	0.21	0.18	0.20

我可以根据时间戳进行分组，但找不到 update/add 列的方法。

然后 df.pivot(index='Timestamp', columns='col_1')，我得到

Timestamp	aaa_data_1	abb_data_1	acc_data_1	aaa_data_2	abb_data_2	acc_data_2
22/12/2001			0.12			0.19
22/12/2001		0.20			0
22/12/2001	0.21			0.2
23/12/2001			0.52			0.20
23/12/2001		0.32			0.18
23/12/2001	0.23			0.21

Answer 1

您只需要一个数据透视表和一个列重命名即可：

result = df.pivot(index='Timestamp', columns='col_1')
result.columns = [f'{col_1}_{data}' for data, col_1 in result.columns]

Answer 2

@CodeDifferent 的回答就足够了，因为您的数据没有聚合；另一种选择是来自 pyjanitor 的 pivot_wider 的开发版本（它们是 pandas 函数的包装器）：

# pip install git+https://github.com/pyjanitor-devs/pyjanitor.git
import pandas as pd
import janitor as jn
df.pivot_wider(index='Timestamp', 
               names_from='col_1', 
               levels_order=['col_1', None],
               names_sep='_')

    Timestamp  aaa_data_1  abb_data_1  acc_data_1  aaa_data_2  abb_data_2  acc_data_2
0  22/12/2001        0.21        0.20        0.12        0.20        0.00        0.19
1  23/12/2001        0.23        0.32        0.52        0.21        0.18        0.20

如果 index 和 names_from 的组合中存在重复，这将失败；在这种情况下，您可以使用 pivot_table，它会处理重复项：

(df.pivot_table(index='Timestamp', columns='col_1')
   .swaplevel(axis = 1)
   .pipe(lambda df: df.set_axis(df.columns.map('_'.join), axis =1))
)
 
            aaa_data_1  abb_data_1  acc_data_1  aaa_data_2  abb_data_2  acc_data_2
Timestamp                                                                         
22/12/2001        0.21        0.20        0.12        0.20        0.00        0.19
23/12/2001        0.23        0.32        0.52        0.21        0.18        0.20

或者使用 pyjanitor 中的辅助方法，以获得一些 cleaner 方法链接语法：

(df.pivot_table(index='Timestamp', columns='col_1')
   .swaplevel(axis = 1)
   .collapse_levels()

如何根据一列对数据框进行分组并根据另一列进行转置

How to groupby a dataframe based on one column and transpose based on another column

transpose

dataframe

python-3.x

pandas

pandas-groupby