用另一列中重复值的数量填充一列

Question

我有一个这样的 df:

month	outcome	mom.ret
10/20	winner	0.2
10/20	winner	0.9
11/20	winner	0.6
11/20	winner	0.2
11/20	winner	0.9
10/20	loser	0.6
10/20	loser	0.2
10/20	loser	0.9
11/20	loser	0.6

我想添加另一列，其中有 1 / 值“赢家”或“输家”每月出现在列结果上的次数。示例 df 的预期输出是：

month	outcome	mom.ret	q
10/20	winner	0.2	1/2
10/20	winner	0.9	1/2
11/20	winner	0.6	1/3
11/20	winner	0.2	1/3
11/20	winner	0.9	1/3
10/20	loser	0.6	1/3
10/20	loser	0.2	1/3
10/20	loser	0.9	1/3
11/20	loser	0.6	1/1

我想过使用函数 count 来计算值重复的次数，但我需要指定每个日期都应该进行“计数”。有什么想法吗？

Answer 1

你可以使用这段代码来实现你想要的，假设你原来的 DataFrame 被称为 df:

counts = df.groupby(['month', 'outcome'], as_index=False).count()
counts = counts.rename(columns={'mom.ret': 'q'})
# Use this line if you want the float value of the division 0.5
# counts['q'] = 1/counts['q']
# Use this line if you want the string '1/2'
counts['q'] = counts['q'].apply(lambda x: f'1/{x}')
result = pd.merge(df, counts)

结果如下所示：

month   outcome mom.ret q
0   10/20   winner  0.2 1/2
1   10/20   winner  0.9 1/2
2   11/20   winner  0.6 1/3
3   11/20   winner  0.2 1/3
4   11/20   winner  0.9 1/3
5   10/20   loser   0.6 1/2
6   10/20   loser   0.2 1/2
7   11/20   loser   0.9 1/2
8   11/20   loser   0.6 1/2

Answer 2

使用df['q'] = 1/df.groupby(['month', 'outcome']).transform('count').

Answer 3

更新的答案：

@timgeb 需要我一个月的 groupby。为了输出分数而不是小数，我使用了方便的 humanize library.

import humanize      # pip install humanize # if needed

df['q'] = 1 / df.groupby(['month', 'outcome'])['month'].transform('count')
df['q'] = df['q'].apply(lambda x : humanize.fractional(x))

请注意，您不能仅将 .count() 与 groupby 一起使用 - 您需要将转换方法 return 与原始 DataFrame 长度相同的系列。

使用 Python 3.9.7，pandas 1.4.1

制作原始 df 的代码（我遗漏了不相关的 mom.ret 列）。

import pandas as pd

df = pd.DataFrame(
    {
        "month": [
            "10/20",
            "10/20",
            "11/20",
            "11/20",
            "11/20",
            "10/20",
            "10/20",
            "10/20",
            "11/20",
        ],
        "outcome": [
            "winner",
            "winner",
            "winner",
            "winner",
            "winner",
            "loser",
            "loser",
            "loser",
            "loser",
        ],
    }
)

用另一列中重复值的数量填充一列

Filling a column with the amount of duplicated values in another column

python

counting

np

pandas