在 python 中通过随机分配创建组

Question

我有一个模型分数分为 3 个类别（高、中和低）的数据集。 table 如下所示：

| Score   |
| ------- |
| high    |
| high    |
| high    |
| low     |
| low     |
| low     |
| medium  |
| medium  |
| medium  |

我想将这些分数随机分配到 4 组。 control、treatment 1、treatment 2、treatment 3。 control 组应该有 20% 的观察结果，其余 80% 必须分成其他 3 个大小相等的组。但是，我希望每组的分数（高、中、低）分布相等。我如何使用 python?

解决这个问题

PS：这只是实际table的一个表示，但它会比这有更多的观察结果。

Answer 1

你可以试试groupby.transform:

cats = [ 'control', 'treatment 1', 'treatment 2', 'treatment 3']
probs = [.2, .8/3, .8/3, .8/3]


(df.groupby('Score')['Score']
   .transform(lambda x: np.random.choice(cats, size=len(x), p=probs, replace=True)
)

在 python 中通过随机分配创建组

Create groups by random assignment in python

python

distribution

pandas

uniform-distribution