Pandas：select 行随机分组，同时保留该组的所有变量

Question

我的数据框如下所示：

id  std     number 
A   1       1
A   0       12
B   123.45  34
B   1       56 
B   12      78
C   134     90
C   1234    100
C   12345   111

我想 select Id 的随机行 同时保留其他行中的所有信息，这样数据框看起来像这样：

id  std     number 
A   1       1
A   0       12
C   134     90
C   1234    100
C   12345   111

我试过

size = 1000   
replace = True  
fn = lambda obj: obj.loc[np.random.choice(obj.index, size, replace),:]
df2 = df1.groupby('Id', as_index=False).apply(fn)

和

df2 = df1.sample(n=1000).groupby('id')

但显然那没有用。任何帮助将不胜感激。

Answer 1

您需要先创建随机 id，然后将原始列 id 与 Series.isin in boolean indexing 进行比较：

#number of groups
N = 2
df2 = df1[df1['id'].isin(df1['id'].drop_duplicates().sample(N))]
print (df2)
  id      std  number
0  A      1.0       1
1  A      0.0      12
5  C    134.0      90
6  C   1234.0     100
7  C  12345.0     111

或者：

N = 2
df2 = df1[df1['id'].isin(np.random.choice(df1['id'].unique(), N))]

Pandas：select 行随机分组，同时保留该组的所有变量

Pandas: select rows by random groups while keeping all of the group's variables

python

random

rows

pandas

pandas-groupby