根据重复出现的值合并数据框中的行

Question

我有以下数据框，每行包含两个值。

如果特定行的一个或两个值再次出现在另一行中，我想合并这些值。原理可以这样解释：如果A和B一起在一行，B和C一起在另一行，那么就意味着A、B和C应该在一起。查看上面的数据框，我想要的结果是：

0    0   1
1    4   5
2    8   9
3   10  11
4   14  15
5   16  17 18 19
6   20  21

我尝试用 df.duplicated 创建一个循环来产生这样的结果，但还没有成功。

Answer 1

这似乎是处理 connected components. You can use the networkx library:

的图论问题

import networkx as nx
g = nx.from_pandas_edgelist(df, 'a', 'b')

pd.concat([pd.Series([list(i)[0], 
                      ' '.join(map(str, list(i)[1:]))],
                    index=['a', 'b']) 
           for i in list(nx.connected_components(g))], axis=1).T

输出：

    a         b
0   0         1
1   4         5
2   8         9
3  10        11
4  14        15
5  16  17 18 19
6  20        21

根据重复出现的值合并数据框中的行

Merging rows in a dataframe based on reoccurring values

python

merge

duplicates

dataframe

pandas