如何在多索引数据帧中以不同的随机顺序随机打乱外部索引和内部索引

Question

以下是生成示例数据帧的一些代码：

fruits=pd.DataFrame()
fruits['month']=['jan','feb','feb','march','jan','april','april','june','march','march','june','april']
fruits['fruit']=['apple','orange','pear','orange','apple','pear','cherry','pear','orange','cherry','apple','cherry']
ind=fruits.index
ind_mnth=fruits['month'].values
fruits['price']=[30,20,40,25,30 ,45,60,45,25,55,37,60]
fruits_grp = fruits.set_index([ind_mnth, ind],drop=False)

如何在此多索引数据帧中以不同的随机顺序随机打乱外部索引和内部索引？

Answer 1

假设此数据框以 MultiIndex 作为输入：

          month   fruit  price
jan   0     jan   apple     30
feb   1     feb  orange     20
      2     feb    pear     40
march 3   march  orange     25
jan   4     jan   apple     30
april 5   april    pear     45
      6   april  cherry     60
june  7    june    pear     45
march 8   march  orange     25
      9   march  cherry     55
june  10   june   apple     37
april 11  april  cherry     60

首先打乱整个 DataFrame，然后按随机顺序索引重新组合月份：

np.random.seed(0)
idx0 = np.unique(fruits_grp.index.get_level_values(0))
np.random.shuffle(idx0)
fruits_grp.sample(frac=1).loc[idx0]

输出：

          month   fruit  price
jan   0     jan   apple     30
      4     jan   apple     30
april 6   april  cherry     60
      5   april    pear     45
      11  april  cherry     60
feb   1     feb  orange     20
      2     feb    pear     40
june  10   june   apple     37
      7    june    pear     45
march 8   march  orange     25
      9   march  cherry     55
      3   march  orange     25

如何在多索引数据帧中以不同的随机顺序随机打乱外部索引和内部索引

How to shuffle the outer index randomly and inner index in a different random order in a multi index dataframe

shuffle

multi-index

dataframe

pandas