如何使用 pandas 从数据集中随机 select 行？

Question

我有一个包含 36k 行的数据集。我想使用 pandas 从中随机 select 9k 行。我如何完成这个任务？

Answer 1

我认为您可以使用 sample - 9k 或 25% 行：

df.sample(n=9000)

或者：

df.sample(frac=0.25)

另一种解决方案是通过 numpy.random.choice 创建 index 的随机样本，然后通过 loc 创建 select - index 必须是唯一的：

df = df.loc[np.random.choice(df.index, size=9000)]

非唯一索引解决方案：

df = df.iloc[np.random.choice(np.arange(len(df)), size=9000)]

Answer 2

numpy

i = np.random.permutation(np.arange(len(df)))
idx = i[:9000]
pd.DataFrame(df.values[idx], df.index[idx])

How to randomly select rows from a data set using pandas?