pandas 对枢轴中的值进行排序 table
pandas sort values in pivot table
我有一个数据框,我想将所有行按 id 分组,其中在带有 country = russia 和 month = march 的行之后是带有 country != russia
的行
输入数据帧:
import pandas as pd
import numpy as np
data = {'fruit': ['pear','pear','pear','banana', 'banana', 'banana', 'apricot', 'pear','watermelon','pear','banana', 'banana', 'banana','banana', 'melon', 'cherry','banana', 'kiwi', 'kiwi', 'kiwi'],
'country': ['france','france', 'france', 'russia', 'russia', 'russia','russia', 'france','russia','usa', 'russia', 'ghana','ghana','ghana', 'ghana', 'albania','andorra', 'russia', 'russia', 'russia'],
'id': ['01','01','01','01','01','01','02','02','03','03','011', '011', '011','011', '6', '6','6', '5', '5', '5'],
'id1': ['01','01','01','01','01','01','02','02','03','03','011', '011', '011','011', '6', '6','6', '5', '5', '5'],
'month': ['january','september','january','january','september','january','january', 'september','march','march', 'november', 'march', 'january','january', 'march', 'january','july', 'march', 'march', 'april']
}
df = pd.DataFrame(data, columns = ['fruit','country', 'id','id1', 'month'])
我用 pd.pivot_table(df, values='id', index=['fruit','country'], columns='id1', aggfunc='count')
制作了枢轴 table,但我得到了许多无用的行,其中我有 NaN
或非常小的数字。
如何对主元 table 进行排序以获得数字不少于 3 的行?谁能看到问题
我需要获取这个数据框
data = {'fruit': ['banana', 'banana', 'kiwi','pear'],
'country': [ 'ghana','russia','russia','france'],
'01': [np.nan,3,np.nan,3],
'011': [3,1,np.nan,np.nan],
'5': [np.nan,np.nan,3,np.nan]
}
df = pd.DataFrame(data, columns = ['fruit','country', '01', '011','5'])
如果 df2
是你的支点 table,你可以这样做:
row_mask = np.any((df2 >= 3).values, axis=1)
col_mask = np.any((df2 >= 3).values, axis=0)
df2.loc[row_mask, col_mask]
id1 01 011 5
fruit country
banana ghana NaN 3.0 NaN
russia 3.0 1.0 NaN
kiwi russia NaN NaN 3.0
pear france 3.0 NaN NaN
这是您想要的结果吗?就是“获取数字不少于3的行”,但是和你的结果图不一样..
df = df.pivot_table(index=['fruit','country'], columns='id1', values='id', aggfunc='count')
df['total'] = df.sum(axis=1)
df.drop(df.loc[df['total']<3].index, inplace=True)
df.dropna(how='all', axis=1, inplace=True)
输出
id1 01 011 02 5 total
fruit country
banana ghana NaN 3.0 NaN NaN 3.0
banana russia 3.0 1.0 NaN NaN 4.0
kiwi russia NaN NaN NaN 3.0 3.0
pear france 3.0 NaN 1.0 NaN 4.0
我有一个数据框,我想将所有行按 id 分组,其中在带有 country = russia 和 month = march 的行之后是带有 country != russia
的行输入数据帧:
import pandas as pd
import numpy as np
data = {'fruit': ['pear','pear','pear','banana', 'banana', 'banana', 'apricot', 'pear','watermelon','pear','banana', 'banana', 'banana','banana', 'melon', 'cherry','banana', 'kiwi', 'kiwi', 'kiwi'],
'country': ['france','france', 'france', 'russia', 'russia', 'russia','russia', 'france','russia','usa', 'russia', 'ghana','ghana','ghana', 'ghana', 'albania','andorra', 'russia', 'russia', 'russia'],
'id': ['01','01','01','01','01','01','02','02','03','03','011', '011', '011','011', '6', '6','6', '5', '5', '5'],
'id1': ['01','01','01','01','01','01','02','02','03','03','011', '011', '011','011', '6', '6','6', '5', '5', '5'],
'month': ['january','september','january','january','september','january','january', 'september','march','march', 'november', 'march', 'january','january', 'march', 'january','july', 'march', 'march', 'april']
}
df = pd.DataFrame(data, columns = ['fruit','country', 'id','id1', 'month'])
我用 pd.pivot_table(df, values='id', index=['fruit','country'], columns='id1', aggfunc='count')
制作了枢轴 table,但我得到了许多无用的行,其中我有 NaN
或非常小的数字。
如何对主元 table 进行排序以获得数字不少于 3 的行?谁能看到问题
我需要获取这个数据框
data = {'fruit': ['banana', 'banana', 'kiwi','pear'],
'country': [ 'ghana','russia','russia','france'],
'01': [np.nan,3,np.nan,3],
'011': [3,1,np.nan,np.nan],
'5': [np.nan,np.nan,3,np.nan]
}
df = pd.DataFrame(data, columns = ['fruit','country', '01', '011','5'])
如果 df2
是你的支点 table,你可以这样做:
row_mask = np.any((df2 >= 3).values, axis=1)
col_mask = np.any((df2 >= 3).values, axis=0)
df2.loc[row_mask, col_mask]
id1 01 011 5
fruit country
banana ghana NaN 3.0 NaN
russia 3.0 1.0 NaN
kiwi russia NaN NaN 3.0
pear france 3.0 NaN NaN
这是您想要的结果吗?就是“获取数字不少于3的行”,但是和你的结果图不一样..
df = df.pivot_table(index=['fruit','country'], columns='id1', values='id', aggfunc='count')
df['total'] = df.sum(axis=1)
df.drop(df.loc[df['total']<3].index, inplace=True)
df.dropna(how='all', axis=1, inplace=True)
输出
id1 01 011 02 5 total
fruit country
banana ghana NaN 3.0 NaN NaN 3.0
banana russia 3.0 1.0 NaN NaN 4.0
kiwi russia NaN NaN NaN 3.0 3.0
pear france 3.0 NaN 1.0 NaN 4.0