如何根据自定义顺序对 pandas 多索引数据框的索引进行排序
How to sort the indices of a pandas multi-index data frame according to a custom order
以下是生成示例数据帧的一些代码:
fruits=pd.DataFrame()
fruits['month']=['jan','feb','feb','march','jan','april','april','june','march','march','june','april']
ind_mnth=fruits['month'].values
fruits['fruit']=['apple','orange','pear','orange','apple','pear','cherry','pear','orange','cherry','apple','cherry']
ind_fruit=fruits['fruit'].values
fruits['price']=[30,20,40,25,30 ,45,60,45,25,55,37,60]
fruits_grp = fruits.set_index([ind_mnth, ind_fruit],drop=False)
如何对这个多索引数据框的行进行排序,使得每个外部索引(月份)下的内部索引(水果)根据自定义顺序排序,并将具有相同外部索引的行组合在一起.
一种方法是创建一个 categorical 系列的水果栏,顺序随心所欲,然后 set_index
每个级别都有一个 Multiindex.from_arrays
,sort_index
像你一样
# custom order
ord_fruit = ['apple', 'pear', 'cherry', 'orange']
# create a ordered Categorical series for the fruits
f = pd.Categorical(fruits['fruit'], categories=ord_fruit, ordered=True)
# get month values, could also be a custom order same idea than above
m = fruits['month'].to_numpy()
# get the result
fruits_grp = fruits.set_index(pd.MultiIndex.from_arrays([m,f])).sort_index()
print(fruits_grp)
month fruit price
april pear april pear 45 # pear before cherry
cherry april cherry 60
cherry april cherry 60
feb pear feb pear 40
orange feb orange 20
jan apple jan apple 30
apple jan apple 30
june apple june apple 37
pear june pear 45
march cherry march cherry 55 # cherry before orange
orange march orange 25
orange march orange 25
请注意,sort_index
将按字母顺序对其他级别进行排序,如果您不想这样,您可以按照相同的方式为每个级别创建自己的顺序。
以下是生成示例数据帧的一些代码:
fruits=pd.DataFrame()
fruits['month']=['jan','feb','feb','march','jan','april','april','june','march','march','june','april']
ind_mnth=fruits['month'].values
fruits['fruit']=['apple','orange','pear','orange','apple','pear','cherry','pear','orange','cherry','apple','cherry']
ind_fruit=fruits['fruit'].values
fruits['price']=[30,20,40,25,30 ,45,60,45,25,55,37,60]
fruits_grp = fruits.set_index([ind_mnth, ind_fruit],drop=False)
如何对这个多索引数据框的行进行排序,使得每个外部索引(月份)下的内部索引(水果)根据自定义顺序排序,并将具有相同外部索引的行组合在一起.
一种方法是创建一个 categorical 系列的水果栏,顺序随心所欲,然后 set_index
每个级别都有一个 Multiindex.from_arrays
,sort_index
像你一样
# custom order
ord_fruit = ['apple', 'pear', 'cherry', 'orange']
# create a ordered Categorical series for the fruits
f = pd.Categorical(fruits['fruit'], categories=ord_fruit, ordered=True)
# get month values, could also be a custom order same idea than above
m = fruits['month'].to_numpy()
# get the result
fruits_grp = fruits.set_index(pd.MultiIndex.from_arrays([m,f])).sort_index()
print(fruits_grp)
month fruit price
april pear april pear 45 # pear before cherry
cherry april cherry 60
cherry april cherry 60
feb pear feb pear 40
orange feb orange 20
jan apple jan apple 30
apple jan apple 30
june apple june apple 37
pear june pear 45
march cherry march cherry 55 # cherry before orange
orange march orange 25
orange march orange 25
请注意,sort_index
将按字母顺序对其他级别进行排序,如果您不想这样,您可以按照相同的方式为每个级别创建自己的顺序。