python, pandas, 如何找到每个组之间的联系

python, pandas, How to find connections between each group

我无法根据关联数据(可能是 groupby?)找到组之间的联系以创建网络。

对于每个组,如果它们具有相同的元素,则它们是连接的。

例如,我的数据框是这样的:

group_number    data
1                a
2                a
2                b
2                c
2                a
3                c
4                a
4                c

所以输出将是

Source_group  Target_group Frequency
2               1           1 (because a-a)
3               2           1 (because c-c)
4               2           2 (because a-a, c-c)

当然(because...)不会在输出中,只是解释

非常感谢

我想到了你的问题。您可以执行以下操作:

import pandas as pd
from collections import defaultdict

df = pd.DataFrame({'group_number': [1,2,2,2,2,3,4,4],
            'data': ['a','a','b','c','a','c','a','c']})

# group the data using multiindex and convert it to dictionary
d = defaultdict(dict)
for multiindex, group in df.groupby(['group_number', 'data']):
    d[multiindex[0]][multiindex[1]] = group.data.size

# iterate groups twice to compare every group 
# with every other group
relationships = []
for key, val in d.items():
    for k, v in d.items():
        if key != k:
            # get the references to two compared groups
            current_row_rel = {}
            current_row_rel['Source_group'] = key
            current_row_rel['Target_group'] = k
            # this is important, but at this point 
            # you are basically comparing intersection of two 
            # simple python lists
            current_row_rel['Frequency'] = len(set(val).intersection(v))
            relationships.append(current_row_rel)

# convert the result to pandas DataFrame for further analysis.
df = pd.DataFrame(relationships)

我确信这可以在不需要转换为字典列表的情况下完成。然而,我发现这个解决方案更直接。