使用 Python Pandas 如何使用分层随机抽样,其中根据抽样要求分配百分比

Using Python Pandas how to use stratified random sampling where assigning percentage as required for sampling

我有一个组和农民 ID 的数据集。我必须 select 18 名农民中的 6 名农民使用分层随机抽样,其中给出了抽样百分比。

分组百分比如下

日期设置:

现在,使用抽样,我必须 select 6 名农民,其中 6x0.50=3 名农民来自组:“M,SC”,6x0.25=2 名农民来自 F、SC 组和 1来自 M 组的农民,ST 将是 select.

这是我目前的情况:

df
Out[41]: 
   Group  ID
0   M,SC   1
1   M,SC   2
2   M,SC   3
3   M,SC   4
4   M,SC   5
5   F,SC   6
6   F,SC   7
7   F,SC   8
8   F,SC   9
9   M,ST  10
10  M,ST  11
11  M,ST  12
12  M,ST  13
13  M,ST  14
14  F,ST  15
15  F,ST  16
16  F,ST  17
17  F,ST  18

N=6

df.groupby('Group', group_keys=False).apply(lambda x: x.sample(int(np.rint(N*len(x)/len(df))))).sample(frac=1).reset_index(drop=True)
Out[43]: 
  Group  ID
0  M,ST  14
1  M,SC   3
2  M,ST  10
3  M,SC   2
4  F,ST  15
5  F,SC   7

现在,我不知道如何在采样中应用给定的百分比,例如 M,SC group:50%, F,SC group:25%, M,ST group:20 %和F,ST组5%,以上代码按比例select sample of N=6.

下面是我用来解决问题的代码

import pandas as pd
import numpy as np
    
df['Proportion'] = df['Group'].replace(['M,SC','F,SC','M,ST','F,ST'],['0.5','0.25','0.2','0.05'])

df['Proportion'] = df['Proportion'].astype('float')

df['Sample']=round(df['Proportion']*6,0)

df['Selected Farmers_ID'] = df['Sample'].apply(np.ceil).astype(int)

df['Selected Farmers_ID'] = df.groupby('Group').apply(lambda df: df['ID'].sample(df['Selected Farmers_ID'].iat[0])).reset_index(level=0)['ID']

df['Selected Farmers_ID'] = df['Selected Farmers_ID'].fillna('')

df['Selected Farmers_ID'].replace('', pd.np.nan, inplace=True)

df.dropna(subset=['Selected Farmers_ID'], inplace=True)

df
Out[11]: 
   Group  ID  Proportion  Sample  Selected Farmers_ID
1   M,SC   2        0.50     3.0                  2.0
3   M,SC   4        0.50     3.0                  4.0
4   M,SC   5        0.50     3.0                  5.0
5   F,SC   6        0.25     2.0                  6.0
8   F,SC   9        0.25     2.0                  9.0
12  M,ST  13        0.20     1.0                 13.0