如何在 pandas 数据框中按日期计算字符串列表中所有元素的出现次数

How to count the number of appearances for all elements in a list of string by date in pandas dataframe

我有以下数据框。 objective是统计"from"和"to"中的所有元素按日期出现的次数。

id  date         from   to      
A   01/01/2020  alice  bob        
A   01/01/2020  alice  carl        
A   01/02/2020  jim    peter
A   01/02/2020  jim    jen
B   01/02/2020  mary   john
B   01/03/2020  john   max
B   01/03/2020  mike   jane
B   01/03/2020  jane   jim
C   01/02/2020  xavier jim
C   01/02/2020  jim    alice
C   01/03/2020  alice  bob
C   01/03/2020  manny  lily
C   01/04/2020  rosa   bob
..     ..        ..     ..

所需的数据帧

name    date     count
alice 01/01/2020   2
alice 01/02/2020   1
alice 01/03/2020   1
bob   01/01/2020   1
bob   01/03/2020   1
bob   01/04/2020   1
carl  01/01/2020   1
peter 01/02/2020   1
jane  01/03/2020   2
jen   01/02/2020   1
jim   01/02/2020   4
jim   01/03/2020   1
john  01/02/2020   1
john  01/03/2020   1
lily  01/03/2020   1
mary  01/02/2020   1
manny 01/03/2020   1
max   01/03/2020   1
mike  01/03/2020   1
rosa  01/04/2020   1
xavier01/02/2020   1

任何帮助将不胜感激!

您可以 set_index + stack 并使用 groupby + value_counts。剩下的只是为了获得所需形式的输出而进行的外观更改:

out = (df.set_index('date')[['from','to']].stack()
       .groupby(level='date').value_counts()
       .reset_index(level=0, name='count').sort_index()
       .rename_axis(index=['name']).reset_index())

输出:

      name        date  count
0    alice  01/01/2020      2
1    alice  01/02/2020      1
2    alice  01/03/2020      1
3      bob  01/01/2020      1
4      bob  01/03/2020      1
5      bob  01/04/2020      1
6     carl  01/01/2020      1
7     jane  01/03/2020      2
8      jen  01/02/2020      1
9      jim  01/02/2020      4
10     jim  01/03/2020      1
11    john  01/02/2020      1
12    john  01/03/2020      1
13    lily  01/03/2020      1
14   manny  01/03/2020      1
15    mary  01/02/2020      1
16     max  01/03/2020      1
17    mike  01/03/2020      1
18   peter  01/02/2020      1
19    rosa  01/04/2020      1
20  xavier  01/02/2020      1

使用melt作为起点:

out = (df.melt(['id', 'date'], value_name='name')
         .value_counts(['name', 'date'], sort=False)
         .rename('count').reset_index())
print(out)

# Output
      name        date  count
0    alice  01/01/2020      2
1    alice  01/02/2020      1
2    alice  01/03/2020      1
3      bob  01/01/2020      1
4      bob  01/03/2020      1
5      bob  01/04/2020      1
6     carl  01/01/2020      1
7     jane  01/03/2020      2
8      jen  01/02/2020      1
9      jim  01/02/2020      4
10     jim  01/03/2020      1
11    john  01/02/2020      1
12    john  01/03/2020      1
13    lily  01/03/2020      1
14   manny  01/03/2020      1
15    mary  01/02/2020      1
16     max  01/03/2020      1
17    mike  01/03/2020      1
18   peter  01/02/2020      1
19    rosa  01/04/2020      1
20  xavier  01/02/2020      1