如何根据自定义顺序对 pandas 多索引数据框的索引进行排序

How to sort the indices of a pandas multi-index data frame according to a custom order

以下是生成示例数据帧的一些代码:

fruits=pd.DataFrame()
fruits['month']=['jan','feb','feb','march','jan','april','april','june','march','march','june','april']
ind_mnth=fruits['month'].values
fruits['fruit']=['apple','orange','pear','orange','apple','pear','cherry','pear','orange','cherry','apple','cherry']
ind_fruit=fruits['fruit'].values
fruits['price']=[30,20,40,25,30 ,45,60,45,25,55,37,60]
fruits_grp = fruits.set_index([ind_mnth, ind_fruit],drop=False)

如何对这个多索引数据框的行进行排序,使得每个外部索引(月份)下的内部索引(水果)根据自定义顺序排序,并将具有相同外部索引的行组合在一起.

一种方法是创建一个 categorical 系列的水果栏,顺序随心所欲,然后 set_index 每个级别都有一个 Multiindex.from_arrayssort_index像你一样

# custom order
ord_fruit = ['apple', 'pear', 'cherry', 'orange']
# create a ordered Categorical series for the fruits
f = pd.Categorical(fruits['fruit'], categories=ord_fruit, ordered=True)

# get month values, could also be a custom order same idea than above
m = fruits['month'].to_numpy()

# get the result
fruits_grp = fruits.set_index(pd.MultiIndex.from_arrays([m,f])).sort_index()
print(fruits_grp)
              month   fruit  price
april pear    april    pear     45 # pear before cherry
      cherry  april  cherry     60
      cherry  april  cherry     60
feb   pear      feb    pear     40
      orange    feb  orange     20
jan   apple     jan   apple     30
      apple     jan   apple     30
june  apple    june   apple     37
      pear     june    pear     45
march cherry  march  cherry     55 # cherry before orange
      orange  march  orange     25
      orange  march  orange     25

请注意,sort_index 将按字母顺序对其他级别进行排序,如果您不想这样,您可以按照相同的方式为每个级别创建自己的顺序。