pandas 将第一个多索引转换为行索引，将第二个多索引转换为列索引

Question

所以我有以下代码，它将给定数据帧的组转换为列并计算每个组的大小。这将创建一个包含一列的 DataFrame 和一个包含两个值的 Multiindex。然后我将结果转换为将第二个索引作为列索引，将第一个索引作为行索引。

import pandas as pd
from random import choice

products=['car', 'pen', 'kitchen', 'bed']
df=pd.DataFrame([{'product': choice(products), 'Changing': choice([True, False])} for _ in range(10000)])

df_grp=df.groupby(['product', 'Changing']).size().to_frame()
print(df_grp)
print('-'*50)

d=[]
for i in df_grp.reset_index()['product'].unique():
    d.append({'product':i, 'not changing': df_grp.loc[(i,False)][0], 'changing': df_grp.loc[(i,True)][0]})
print(pd.DataFrame(d).set_index('product'))

这会导致以下输出

                     0
product Changing      
bed     False     1253
        True      1269
car     False     1244
        True      1275
kitchen False     1236
        True      1271
pen     False     1206
        True      1246
--------------------------------------------------
         not changing  changing
product                        
bed              1253      1269
car              1244      1275
kitchen          1236      1271
pen              1206      1246

这正是我想要实现的。但是，我的做法似乎有点复杂。有没有更优雅的方法来做到这一点？也许使用 da 内置 pandas 函数或 lambda 函数？我的方法也不能很好地概括，因为我总是假设第二个索引是一个布尔值。

Answer 1

尝试通过 unstack():

df_grp=df.groupby(['product', 'Changing']).size().unstack()

最后利用columns属性：

df_grp.columns=[f'{"not "*col}{df_grp.columns.name}' for col in df_grp][::-1]
#as suggested by @Cyttorak #rename column without hard coding
#OR
df_grp.columns=['not changing','changing'] #you have to manually write the names

df_grp的输出：

          not changing  changing
product         
bed         1210        1226
car         1220        1272
kitchen     1238        1277
pen         1267        1290

pandas 将第一个多索引转换为行索引，将第二个多索引转换为列索引

pandas transform 1st mutliindex to rowindex and 2nd multiindex to columnindex

python

multi-index

pandas

pandas-groupby