根据另一列查找经常一起出现的类别

Find the categories that frequently occur together based on another column

假设我在 Pandas 数据框中有以下数据:

Paper ID Author ID
Paper_1 Author_1
Paper_1 Author_2
Paper_2 Author_2
Paper_3 Author_1
Paper_3 Author_2
Paper_3 Author_3
Paper_4 Author_1
Paper_4 Author_3

我需要找到非零协作的数量。所以,输出应该是:
(Author_1,Author_2) --> 2
(Author_1,Author_3) --> 1

如有任何帮助或建议,我们将不胜感激。

如果数据相当小,那么在 Paper ID 上合并将生成可以是 collapsed/aggregated:

的对
# assume df has columns Paper ID, Author ID
df_merged = df.merge(df, on="Paper ID")

# keep only one instance of a collaboration
mask = df_merged["Author ID_x"] > df_merged["Author ID_y"]

# aggregate (note the use of the mask to avoid double-
# counting and self-collaborations as noted in the
# comment by Riccardo Bucco)
counts = (
    df_merged[mask]
    .groupby(["Author ID_x", "Author ID_y"])
    .agg(collaboration_count=("Paper ID", "count"))
)