如何在多索引数据帧中以不同的随机顺序随机打乱外部索引和内部索引
How to shuffle the outer index randomly and inner index in a different random order in a multi index dataframe
以下是生成示例数据帧的一些代码:
fruits=pd.DataFrame()
fruits['month']=['jan','feb','feb','march','jan','april','april','june','march','march','june','april']
fruits['fruit']=['apple','orange','pear','orange','apple','pear','cherry','pear','orange','cherry','apple','cherry']
ind=fruits.index
ind_mnth=fruits['month'].values
fruits['price']=[30,20,40,25,30 ,45,60,45,25,55,37,60]
fruits_grp = fruits.set_index([ind_mnth, ind],drop=False)
如何在此多索引数据帧中以不同的随机顺序随机打乱外部索引和内部索引?
假设此数据框以 MultiIndex 作为输入:
month fruit price
jan 0 jan apple 30
feb 1 feb orange 20
2 feb pear 40
march 3 march orange 25
jan 4 jan apple 30
april 5 april pear 45
6 april cherry 60
june 7 june pear 45
march 8 march orange 25
9 march cherry 55
june 10 june apple 37
april 11 april cherry 60
首先打乱整个 DataFrame,然后按随机顺序索引重新组合月份:
np.random.seed(0)
idx0 = np.unique(fruits_grp.index.get_level_values(0))
np.random.shuffle(idx0)
fruits_grp.sample(frac=1).loc[idx0]
输出:
month fruit price
jan 0 jan apple 30
4 jan apple 30
april 6 april cherry 60
5 april pear 45
11 april cherry 60
feb 1 feb orange 20
2 feb pear 40
june 10 june apple 37
7 june pear 45
march 8 march orange 25
9 march cherry 55
3 march orange 25
以下是生成示例数据帧的一些代码:
fruits=pd.DataFrame()
fruits['month']=['jan','feb','feb','march','jan','april','april','june','march','march','june','april']
fruits['fruit']=['apple','orange','pear','orange','apple','pear','cherry','pear','orange','cherry','apple','cherry']
ind=fruits.index
ind_mnth=fruits['month'].values
fruits['price']=[30,20,40,25,30 ,45,60,45,25,55,37,60]
fruits_grp = fruits.set_index([ind_mnth, ind],drop=False)
如何在此多索引数据帧中以不同的随机顺序随机打乱外部索引和内部索引?
假设此数据框以 MultiIndex 作为输入:
month fruit price
jan 0 jan apple 30
feb 1 feb orange 20
2 feb pear 40
march 3 march orange 25
jan 4 jan apple 30
april 5 april pear 45
6 april cherry 60
june 7 june pear 45
march 8 march orange 25
9 march cherry 55
june 10 june apple 37
april 11 april cherry 60
首先打乱整个 DataFrame,然后按随机顺序索引重新组合月份:
np.random.seed(0)
idx0 = np.unique(fruits_grp.index.get_level_values(0))
np.random.shuffle(idx0)
fruits_grp.sample(frac=1).loc[idx0]
输出:
month fruit price
jan 0 jan apple 30
4 jan apple 30
april 6 april cherry 60
5 april pear 45
11 april cherry 60
feb 1 feb orange 20
2 feb pear 40
june 10 june apple 37
7 june pear 45
march 8 march orange 25
9 march cherry 55
3 march orange 25