从 pandas 数据框的每行多个值:获取具有每个值关系的两列(使用 Networkx 分析网络)

From multiple values per rows of a pandas dataframe: get two columns with every realation of the values (to analyse the network with Networkx)

我有一个包含人名的数据框。这些人在同一个项目上一起工作。

item   names
a      moriz, jon, cate 
b      jon, lenard 
c      cate, martin, leo, jil 
item    person 1    person 2
a       moriz       jon
a       moriz       cate
a       jon         cate
b       jon         lenard
c       cate        martin
c       cate        leo
c       cate        jil
c       jil         martin
c       jil         leo
c       martin      leo

你可以这样做(df 你的数据框):

import pandas as pd
from itertools import combinations

df = pd.DataFrame(
    {
        'item': ['a', 'b', 'c'],
        'names': ['moriz, jon, cate', 'jon, lenard', 'cate, martin, leo, jil']
    }
)

df.names = df.names.str.split(", ").map(lambda l: list(combinations(l, 2)))
df = df.explode("names")
df[["person 1", "person 2"]] = df.names.str.join(",").str.split(",", expand=True)
df = df.drop(columns="names")

示例结果:

  item person 1 person 2
0    a    moriz      jon
0    a    moriz     cate
0    a      jon     cate
1    b      jon   lenard
2    c     cate   martin
2    c     cate      leo
2    c     cate      jil
2    c   martin      leo
2    c   martin      jil
2    c      leo      jil