pandas 对枢轴中的值进行排序 table

pandas sort values in pivot table

我有一个数据框,我想将所有行按 id 分组,其中在带有 country = russia 和 month = march 的行之后是带有 country != russia

的行

输入数据帧:

import pandas as pd
import numpy as np
data = {'fruit': ['pear','pear','pear','banana', 'banana', 'banana', 'apricot', 'pear','watermelon','pear','banana', 'banana', 'banana','banana', 'melon', 'cherry','banana', 'kiwi', 'kiwi', 'kiwi'],
'country': ['france','france', 'france', 'russia', 'russia', 'russia','russia', 'france','russia','usa', 'russia', 'ghana','ghana','ghana', 'ghana', 'albania','andorra', 'russia', 'russia', 'russia'],
'id': ['01','01','01','01','01','01','02','02','03','03','011', '011', '011','011', '6', '6','6', '5', '5', '5'],
'id1': ['01','01','01','01','01','01','02','02','03','03','011', '011', '011','011', '6', '6','6', '5', '5', '5'],
'month': ['january','september','january','january','september','january','january', 'september','march','march', 'november', 'march', 'january','january', 'march', 'january','july', 'march', 'march', 'april']        
}
df = pd.DataFrame(data, columns = ['fruit','country', 'id','id1', 'month'])

我用 pd.pivot_table(df, values='id', index=['fruit','country'], columns='id1', aggfunc='count') 制作了枢轴 table,但我得到了许多无用的行,其中我有 NaN 或非常小的数字。

如何对主元 table 进行排序以获得数字不少于 3 的行?谁能看到问题

我需要获取这个数据框

data = {'fruit': ['banana', 'banana', 'kiwi','pear'],
'country': [ 'ghana','russia','russia','france'],
'01': [np.nan,3,np.nan,3],
'011': [3,1,np.nan,np.nan],
'5': [np.nan,np.nan,3,np.nan]
}
df = pd.DataFrame(data, columns = ['fruit','country', '01', '011','5'])

如果 df2 是你的支点 table,你可以这样做:

row_mask = np.any((df2 >= 3).values, axis=1)
col_mask = np.any((df2 >= 3).values, axis=0)
df2.loc[row_mask, col_mask]
            id1  01     011       5
fruit   country             
banana  ghana   NaN     3.0     NaN
        russia  3.0     1.0     NaN
kiwi    russia  NaN     NaN     3.0
pear    france  3.0     NaN     NaN 

这是您想要的结果吗?就是“获取数字不少于3的行”,但是和你的结果图不一样..

df = df.pivot_table(index=['fruit','country'], columns='id1', values='id', aggfunc='count')
df['total'] = df.sum(axis=1)
df.drop(df.loc[df['total']<3].index, inplace=True)
df.dropna(how='all', axis=1, inplace=True)

输出

         id1    01  011 02  5   total
fruit   country                 
banana  ghana   NaN 3.0 NaN NaN 3.0
banana  russia  3.0 1.0 NaN NaN 4.0
kiwi    russia  NaN NaN NaN 3.0 3.0
pear    france  3.0 NaN 1.0 NaN 4.0