将单个组的一小部分值随机更改为其他组的值

Question

我有包含列 class 的数据框，在 class 列中有 3 个文本值 'positive'、'negative' 和 'neutral'。我想将 40% 的中性值更改为正值，将 30% 的中性值更改为负值，并使用 pandas python.

将剩余的 30% 中性值保留在数据框中

Answer 1

设置示例：

np.random.seed(0)
df = pd.DataFrame({'col': np.random.choice(['positive', 'negative', 'neutral'], 1000)})

#         col
# 0  positive
# 1  negative
# 2  positive
# 3  negative
# 4  negative

df.value_counts(normalize=True)
# positive    0.337
# negative    0.335
# neutral     0.328

然后我们可以得到中立的索引，将它们打乱并拆分：

# get shuffled index of neutral
idx = df[df['col'].eq('neutral')].sample(frac=1).index
L = len(idx)

# replace first random 40%
df.loc[idx[:int(L*0.4)], 'col'] = 'positive'
# replace next random 30%
df.loc[idx[int(L*0.4):int(L*0.7)], 'col'] = 'negative'

值计数（分数）：

>>> df.value_counts(normalize=True)
positive    0.468
negative    0.433
neutral     0.099

将单个组的一小部分值随机更改为其他组的值

randomly change value of a fraction of a single group to values of other groups

dataset

dataframe

pandas