有没有办法在 python 中为参与者-组织者建立一个共现(频率)矩阵?

Is there a way to build a co-occurrence (frequency) matrix for participant-organizer in python?

假设我们有一个如下所示的 Dataframe:

df = pd.DataFrame({'participant_id' : [1608, 1608, 2089, 213, 1608, 1887, 2089, 4544, 6866, 2020, 2020],
               'organizer_id' : [1772, 1772, 1772, 1790, 1790, 1790, 1791, 1791, 1772, 1799, 1799]})

如果我们打印上面的内容,我们得到:

print(df)



  participant_id   organizer_id
0         1608        1772
1         1608        1772
2         2089        1772
3         213         1790
4         1608        1790
5         1887        1790
6         2089        1791
7         4544        1791
8         6866        1772
9         2020        1799
10        2020        1799

了解每个参与者以如下所示的共现矩阵形式参与组织者任务的次数将很有价值:

    1772  1790  1791  1799  
1608   2.   1.     0.    0 
2089   1.   0.     1.    0
213    0.   1.     0.    0 
1887   0.   1.     0.    0   
4544   0.   0.     1.    0
6866   1.   0.     0.    0
2020   0.   0.     0.    2

如何从数据框 df 在 python 中构建这样一个矩阵?

df.groupby(by=["participant_id", "organizer_id"]).size().unstack('organizer_id').fillna(0)

organizer_id    1772  1790  1791  1799
participant_id                        
213              0.0   1.0   0.0   0.0
1608             2.0   1.0   0.0   0.0
1887             0.0   1.0   0.0   0.0
2020             0.0   0.0   0.0   2.0
2089             1.0   0.0   1.0   0.0
4544             0.0   0.0   1.0   0.0
6866             1.0   0.0   0.0   0.0

这与 How to create co-occurrence matrix from pandas two column?

重复

使用 pd.crosstab(df['participant_id'], df['organizer_id']) 获取输出矩阵。