用另一列中重复值的数量填充一列
Filling a column with the amount of duplicated values in another column
我有一个这样的 df:
month
outcome
mom.ret
10/20
winner
0.2
10/20
winner
0.9
11/20
winner
0.6
11/20
winner
0.2
11/20
winner
0.9
10/20
loser
0.6
10/20
loser
0.2
10/20
loser
0.9
11/20
loser
0.6
我想添加另一列,其中有 1 / 值“赢家”或“输家”每月出现在列结果上的次数。示例 df 的预期输出是:
month
outcome
mom.ret
q
10/20
winner
0.2
1/2
10/20
winner
0.9
1/2
11/20
winner
0.6
1/3
11/20
winner
0.2
1/3
11/20
winner
0.9
1/3
10/20
loser
0.6
1/3
10/20
loser
0.2
1/3
10/20
loser
0.9
1/3
11/20
loser
0.6
1/1
我想过使用函数 count 来计算值重复的次数,但我需要指定每个日期都应该进行“计数”。有什么想法吗?
你可以使用这段代码来实现你想要的,假设你原来的 DataFrame 被称为 df
:
counts = df.groupby(['month', 'outcome'], as_index=False).count()
counts = counts.rename(columns={'mom.ret': 'q'})
# Use this line if you want the float value of the division 0.5
# counts['q'] = 1/counts['q']
# Use this line if you want the string '1/2'
counts['q'] = counts['q'].apply(lambda x: f'1/{x}')
result = pd.merge(df, counts)
结果如下所示:
month outcome mom.ret q
0 10/20 winner 0.2 1/2
1 10/20 winner 0.9 1/2
2 11/20 winner 0.6 1/3
3 11/20 winner 0.2 1/3
4 11/20 winner 0.9 1/3
5 10/20 loser 0.6 1/2
6 10/20 loser 0.2 1/2
7 11/20 loser 0.9 1/2
8 11/20 loser 0.6 1/2
使用df['q'] = 1/df.groupby(['month', 'outcome']).transform('count')
.
更新的答案:
@timgeb 需要我一个月的 groupby。为了输出分数而不是小数,我使用了方便的 humanize library.
import humanize # pip install humanize # if needed
df['q'] = 1 / df.groupby(['month', 'outcome'])['month'].transform('count')
df['q'] = df['q'].apply(lambda x : humanize.fractional(x))
请注意,您不能仅将 .count()
与 groupby 一起使用 - 您需要将转换方法 return 与原始 DataFrame 长度相同的系列。
使用 Python 3.9.7,pandas 1.4.1
制作原始 df 的代码(我遗漏了不相关的 mom.ret 列)。
import pandas as pd
df = pd.DataFrame(
{
"month": [
"10/20",
"10/20",
"11/20",
"11/20",
"11/20",
"10/20",
"10/20",
"10/20",
"11/20",
],
"outcome": [
"winner",
"winner",
"winner",
"winner",
"winner",
"loser",
"loser",
"loser",
"loser",
],
}
)
我有一个这样的 df:
month | outcome | mom.ret |
---|---|---|
10/20 | winner | 0.2 |
10/20 | winner | 0.9 |
11/20 | winner | 0.6 |
11/20 | winner | 0.2 |
11/20 | winner | 0.9 |
10/20 | loser | 0.6 |
10/20 | loser | 0.2 |
10/20 | loser | 0.9 |
11/20 | loser | 0.6 |
我想添加另一列,其中有 1 / 值“赢家”或“输家”每月出现在列结果上的次数。示例 df 的预期输出是:
month | outcome | mom.ret | q |
---|---|---|---|
10/20 | winner | 0.2 | 1/2 |
10/20 | winner | 0.9 | 1/2 |
11/20 | winner | 0.6 | 1/3 |
11/20 | winner | 0.2 | 1/3 |
11/20 | winner | 0.9 | 1/3 |
10/20 | loser | 0.6 | 1/3 |
10/20 | loser | 0.2 | 1/3 |
10/20 | loser | 0.9 | 1/3 |
11/20 | loser | 0.6 | 1/1 |
我想过使用函数 count 来计算值重复的次数,但我需要指定每个日期都应该进行“计数”。有什么想法吗?
你可以使用这段代码来实现你想要的,假设你原来的 DataFrame 被称为 df
:
counts = df.groupby(['month', 'outcome'], as_index=False).count()
counts = counts.rename(columns={'mom.ret': 'q'})
# Use this line if you want the float value of the division 0.5
# counts['q'] = 1/counts['q']
# Use this line if you want the string '1/2'
counts['q'] = counts['q'].apply(lambda x: f'1/{x}')
result = pd.merge(df, counts)
结果如下所示:
month outcome mom.ret q
0 10/20 winner 0.2 1/2
1 10/20 winner 0.9 1/2
2 11/20 winner 0.6 1/3
3 11/20 winner 0.2 1/3
4 11/20 winner 0.9 1/3
5 10/20 loser 0.6 1/2
6 10/20 loser 0.2 1/2
7 11/20 loser 0.9 1/2
8 11/20 loser 0.6 1/2
使用df['q'] = 1/df.groupby(['month', 'outcome']).transform('count')
.
更新的答案:
@timgeb 需要我一个月的 groupby。为了输出分数而不是小数,我使用了方便的 humanize library.
import humanize # pip install humanize # if needed
df['q'] = 1 / df.groupby(['month', 'outcome'])['month'].transform('count')
df['q'] = df['q'].apply(lambda x : humanize.fractional(x))
请注意,您不能仅将 .count()
与 groupby 一起使用 - 您需要将转换方法 return 与原始 DataFrame 长度相同的系列。
使用 Python 3.9.7,pandas 1.4.1
制作原始 df 的代码(我遗漏了不相关的 mom.ret 列)。
import pandas as pd
df = pd.DataFrame(
{
"month": [
"10/20",
"10/20",
"11/20",
"11/20",
"11/20",
"10/20",
"10/20",
"10/20",
"11/20",
],
"outcome": [
"winner",
"winner",
"winner",
"winner",
"winner",
"loser",
"loser",
"loser",
"loser",
],
}
)