使用 Python Pandas 如何使用分层随机抽样,其中根据抽样要求分配百分比
Using Python Pandas how to use stratified random sampling where assigning percentage as required for sampling
我有一个组和农民 ID 的数据集。我必须 select 18 名农民中的 6 名农民使用分层随机抽样,其中给出了抽样百分比。
分组百分比如下
日期设置:
现在,使用抽样,我必须 select 6 名农民,其中 6x0.50=3 名农民来自组:“M,SC”,6x0.25=2 名农民来自 F、SC 组和 1来自 M 组的农民,ST 将是 select.
这是我目前的情况:
df
Out[41]:
Group ID
0 M,SC 1
1 M,SC 2
2 M,SC 3
3 M,SC 4
4 M,SC 5
5 F,SC 6
6 F,SC 7
7 F,SC 8
8 F,SC 9
9 M,ST 10
10 M,ST 11
11 M,ST 12
12 M,ST 13
13 M,ST 14
14 F,ST 15
15 F,ST 16
16 F,ST 17
17 F,ST 18
N=6
df.groupby('Group', group_keys=False).apply(lambda x: x.sample(int(np.rint(N*len(x)/len(df))))).sample(frac=1).reset_index(drop=True)
Out[43]:
Group ID
0 M,ST 14
1 M,SC 3
2 M,ST 10
3 M,SC 2
4 F,ST 15
5 F,SC 7
现在,我不知道如何在采样中应用给定的百分比,例如 M,SC group:50%, F,SC group:25%, M,ST group:20 %和F,ST组5%,以上代码按比例select sample of N=6.
下面是我用来解决问题的代码
import pandas as pd
import numpy as np
df['Proportion'] = df['Group'].replace(['M,SC','F,SC','M,ST','F,ST'],['0.5','0.25','0.2','0.05'])
df['Proportion'] = df['Proportion'].astype('float')
df['Sample']=round(df['Proportion']*6,0)
df['Selected Farmers_ID'] = df['Sample'].apply(np.ceil).astype(int)
df['Selected Farmers_ID'] = df.groupby('Group').apply(lambda df: df['ID'].sample(df['Selected Farmers_ID'].iat[0])).reset_index(level=0)['ID']
df['Selected Farmers_ID'] = df['Selected Farmers_ID'].fillna('')
df['Selected Farmers_ID'].replace('', pd.np.nan, inplace=True)
df.dropna(subset=['Selected Farmers_ID'], inplace=True)
df
Out[11]:
Group ID Proportion Sample Selected Farmers_ID
1 M,SC 2 0.50 3.0 2.0
3 M,SC 4 0.50 3.0 4.0
4 M,SC 5 0.50 3.0 5.0
5 F,SC 6 0.25 2.0 6.0
8 F,SC 9 0.25 2.0 9.0
12 M,ST 13 0.20 1.0 13.0
我有一个组和农民 ID 的数据集。我必须 select 18 名农民中的 6 名农民使用分层随机抽样,其中给出了抽样百分比。
分组百分比如下
日期设置:
现在,使用抽样,我必须 select 6 名农民,其中 6x0.50=3 名农民来自组:“M,SC”,6x0.25=2 名农民来自 F、SC 组和 1来自 M 组的农民,ST 将是 select.
这是我目前的情况:
df
Out[41]:
Group ID
0 M,SC 1
1 M,SC 2
2 M,SC 3
3 M,SC 4
4 M,SC 5
5 F,SC 6
6 F,SC 7
7 F,SC 8
8 F,SC 9
9 M,ST 10
10 M,ST 11
11 M,ST 12
12 M,ST 13
13 M,ST 14
14 F,ST 15
15 F,ST 16
16 F,ST 17
17 F,ST 18
N=6
df.groupby('Group', group_keys=False).apply(lambda x: x.sample(int(np.rint(N*len(x)/len(df))))).sample(frac=1).reset_index(drop=True)
Out[43]:
Group ID
0 M,ST 14
1 M,SC 3
2 M,ST 10
3 M,SC 2
4 F,ST 15
5 F,SC 7
现在,我不知道如何在采样中应用给定的百分比,例如 M,SC group:50%, F,SC group:25%, M,ST group:20 %和F,ST组5%,以上代码按比例select sample of N=6.
下面是我用来解决问题的代码
import pandas as pd
import numpy as np
df['Proportion'] = df['Group'].replace(['M,SC','F,SC','M,ST','F,ST'],['0.5','0.25','0.2','0.05'])
df['Proportion'] = df['Proportion'].astype('float')
df['Sample']=round(df['Proportion']*6,0)
df['Selected Farmers_ID'] = df['Sample'].apply(np.ceil).astype(int)
df['Selected Farmers_ID'] = df.groupby('Group').apply(lambda df: df['ID'].sample(df['Selected Farmers_ID'].iat[0])).reset_index(level=0)['ID']
df['Selected Farmers_ID'] = df['Selected Farmers_ID'].fillna('')
df['Selected Farmers_ID'].replace('', pd.np.nan, inplace=True)
df.dropna(subset=['Selected Farmers_ID'], inplace=True)
df
Out[11]:
Group ID Proportion Sample Selected Farmers_ID
1 M,SC 2 0.50 3.0 2.0
3 M,SC 4 0.50 3.0 4.0
4 M,SC 5 0.50 3.0 5.0
5 F,SC 6 0.25 2.0 6.0
8 F,SC 9 0.25 2.0 9.0
12 M,ST 13 0.20 1.0 13.0