Python pandas 没有混合指数的样本

Question

我想为数据框的每个索引值独立应用 Pandas 中的 sample 函数。这可以通过这样的 for 循环来完成：

import pandas

df = pandas.DataFrame({'something': [3,4,2,2,6,7], 'n': [1,1,2,2,3,3]})
df.set_index(['n'], inplace=True)

resampled_as_I_want_df = df[0:0]
for i in sorted(set(df.index)):
    resampled_as_I_want_df = resampled_as_I_want_df.append(
        df.loc[i].sample(frac=1, replace=True),
    )

print(resampled_as_I_want_df)

让我用通俗易懂的方式解释一下。 df 数据框如下所示：

   something
n           
1          3
1          4
2          2
2          2
3          6
3          7

现在我们看到三个“索引组”的值分别为1、2和3。我想要做的是应用 sample 函数，使新数据帧具有相同的索引，无需随机采样，并且在每个组内执行采样，就好像它们是独立的数据帧一样。

有没有办法避免 for 循环？对于大型数据帧，这是一个瓶颈。

Answer 1

使用df.groupby(level=0).sample(frac=1, replace=True).

Python pandas 没有混合指数的样本

Python pandas sample without mixing index

python

sample

pandas