两列的滚动事件

Rolling occurences from two colums

在 pandas 我有两个系列的 x 行,我想添加一列,在其中我获得 col1 中的值从第一行到 x-1 出现的次数的滚动计数.

df是这样的:

   col1 col2
0  B    A
1  B    C
2  A    B
3  A    B
4  A    C
5  B    A

期望的输出是

   col1 col2 freq
0  B    A    0
1  B    C    1
2  A    B    1
3  A    B    2
4  A    C    3    #A appears 3 times in the two columns from row 0 to 3
5  B    A    4    #B appears 4 times in the two columns from row 0 to 4

预先感谢初学者, G

from collections import defaultdict

def fn():
    d1, d2 = defaultdict(int), defaultdict(int)
    x = yield
    while True:
        x = yield d1[x.col1] + d2[x.col1]
        d1[x.col1] += 1
        d2[x.col2] += 1

f = fn()
next(f)
df['freq'] = df[['col1', 'col2']].apply(lambda x: f.send(x), axis=1)

print(df)

打印:

  col1 col2  freq
0    B    A     0
1    B    C     1
2    A    B     1
3    A    B     2
4    A    C     3
5    B    A     4

编辑(任意列数的解决方案):

from collections import defaultdict

def fn(cols):
    dd = [defaultdict(int) for _ in cols]
    x = yield
    while True:
        x = yield sum(d[x[0]] for d in dd)
        for i, d in enumerate(dd):
            d[x[i]] += 1

cols = ['col1', 'col2']
f = fn(cols)
next(f)
df['freq'] = df[cols].apply(lambda x: f.send(x), axis=1)

print(df)

无论 df 中的列数如何,这都将解决

import pandas as pd
import numpy as np

def add(d1,d2):
    # adding two dictionary
    for i in d2.keys():
        if i in d1.keys():
            d1[i] = d1[i] +d2[i]
        else:
            d1[i] = d2[i]
    return d1

if __name__ == '__main__':
    counts = {}
    df = pd.DataFrame({"a":[1, 2, 3, 1, 2], "b":[2, 1, 2, 3, 1]})
    col = list(df)
    for ind, it in df.iterrows():
        unique,count = np.unique(it,return_counts=True)
        unique_dict = dict(zip(unique, count))
        counts = add(counts,unique_dict)

        df.loc[ind, "freq"] = counts[it[col[0]]]
    df["freq"] =df["freq"]-1

让我们使用一些数据帧整形、groupby 和 cumcount:

dfs = df.stack()
df['freq'] = dfs.groupby(dfs).cumcount().unstack()['col1']
print(df)

输出:

  col1 col2  freq
0    B    A     0
1    B    C     1
2    A    B     1
3    A    B     2
4    A    C     3
5    B    A     4