如何删除共享某个列值的 50% 的行

Question

df.groupby(['target']).count()

Target	data
Negative	103210
Positive	211082

现在，我的正面数据太大了。我想删除 Target 列中值为 Positive 的行的 50%。我该怎么做？

Answer 1

要保留 Positive 行的一半，sample 50% of the Positive rows using frac=0.5 and drop 这些索引：

indexes = df[df.target == 'Positive'].sample(frac=0.5).index
df = df.drop(indexes)

要准确保留 100K Positive 行，sample 100K Positive rows using n=100_000 and concat 它们与 Negative 行：

df = pd.concat([
    df[df.target == 'Negative'],
    df[df.target == 'Positive'].sample(n=100_000)
])

How to delete 50% of rows that share a certain column value