根据特定值在另一列中的累积出现次数创建新列 pandas

Question

我想统计一个特定值（字符串）在一列中出现的次数，并把它累计记在另一列中。

例如，这里统计Y个值的累计数：

col_1  new_col
Y        1
Y        2
N        2
Y        3
N        3

我写了这段代码，但它给了我最终数字而不是累积频率。

df['new_col'] = 0
df['new_col'] = df.loc[df.col_1 == 'Y'].count()

Answer 1

要累计计算两个值，您可以使用：

df['new_col'] = (df
                 .groupby('col_1')
                 .cumcount().add(1)
                 .cummax()
                 )

如果你想关注'Y':

df['new_col'] = (df
                 .groupby('col_1')
                 .cumcount().add(1)
                 .where(df['col_1'].eq('Y'))
                 .ffill()
                 .fillna(0, downcast='infer')
                 )

输出：

  col_1  new_col
0     Y        1
1     Y        2
2     N        2
3     Y        3
4     N        3

根据特定值在另一列中的累积出现次数创建新列 pandas

create a new column based on cumulative occurrences of a specific value in another column pandas

cumulative-frequency

pandas