给定 n 个表示对的元组,return 一个包含连接元组的列表

Given n tuples representing pairs, return a list with connected tuples

给定 n 个元组,编写一个函数,该函数将 return 一个包含连接值的列表。

示例:

pairs = [(1,62),
    (1,192),
    (1,168),
    (64,449),
    (263,449),      
    (192,289),
    (128,263),
    (128,345),
    (3,10),
    (10,11)
    ]

结果:

[[1,62,192,168,289],
 [64,449,263,128,345,449],
 [3,10,11]]     

我相信它可以使用图或树作为数据结构来解决,为每个值创建节点,并为每对创建边,每对树或图表示连接的值,但我还没有找到解决方案。

在 python 中产生一个结果的最佳方法是什么,该结果会产生这些对的连接值列表?

您可以通过 Disjoint Set (Union-Find) 实施来解决它。

用所有数字初始化结构djs。然后对于每个元组(x,y),调用djs.merge(x,y)。现在,对于每个数字 x,为其创建一个新集合当且仅当 djs.sameSet(x,)==false 来自每个现有集合的任意 y

也许that可以帮到你。

你似乎有一个图(以边列表的形式)可能不是一个整体("connected")你想把它分成几个部分("components").

一旦我们用图形语言考虑它,我们就大功告成了。我们可以保留到目前为止我们找到的所有组件的列表(这些将是节点集),如果它的伙伴已经存在,则将一个节点添加到该集合中。否则,为这对制作一个新组件。

def graph_components(edges):
    """
    Given a graph as a list of edges, divide the nodes into components.

    Takes a list of pairs of nodes, where the nodes are integers.
    Returns a list of sets of nodes (the components).
    """

    # A list of sets.
    components = []

    for v1, v2 in edges:
        # See if either end of the edge has been seen yet.
        for component in components:
            if v1 in component or v2 in component:
                # Add both vertices -- duplicates will vanish.
                component.add(v1)
                component.add(v2)
                break
        else:
            # If neither vertex is already in a component.
            components.append({v1, v2})

    return components

我使用了奇怪的 for ... else 构造来实现这个功能——如果 break 语句在 [=] 期间未达到,则执行 else 15=]。内部循环也可以是一个返回 TrueFalse.

的函数

编辑:正如 Francis Colas 指出的那样,这种方法太贪心了。这是一个完全不同的方法(非常感谢 Edward Mann this 漂亮的 DFS 实现)。

这种方法基于构建一个图,然后对其进行遍历,直到我们 运行 离开未访问的节点。它应该 运行 在线性时间内(O(n) 构建图形,O(n) 进行所有遍历,我相信 O(n) 只是做集合差异)。

from collections import defaultdict

def dfs(start, graph):
    """
    Does depth-first search, returning a set of all nodes seen.
    Takes: a graph in node --> [neighbors] form.
    """
    visited, worklist = set(), [start]

    while worklist:
        node = worklist.pop()
        if node not in visited:
            visited.add(node)
            # Add all the neighbors to the worklist.
            worklist.extend(graph[node])

    return visited

def graph_components(edges):
    """
    Given a graph as a list of edges, divide the nodes into components.
    Takes a list of pairs of nodes, where the nodes are integers.
    """

    # Construct a graph (mapping node --> [neighbors]) from the edges.
    graph = defaultdict(list)
    nodes = set()

    for v1, v2 in edges:
        nodes.add(v1)
        nodes.add(v2)

        graph[v1].append(v2)
        graph[v2].append(v1)

    # Traverse the graph to find the components.
    components = []

    # We don't care what order we see the nodes in.
    while nodes:
        component = dfs(nodes.pop(), graph)
        components.append(component)

        # Remove this component from the nodes under consideration.
        nodes -= component

    return components

我不知道这个问题已经有了名字(感谢 avim!),所以我继续天真地解决了它。

此解决方案与 Eli Rose 的解决方案有些相似。不过,我决定 post 它,因为它对于大型对列表更有效,因为 lists_by_element 字典跟踪元素所在的列表,允许我们避免每次我们需要添加新项目时,遍历所有列表及其项目。

代码如下:

def connected_tuples(pairs):
    # for every element, we keep a reference to the list it belongs to
    lists_by_element = {}

    def make_new_list_for(x, y):
        lists_by_element[x] = lists_by_element[y] = [x, y]

    def add_element_to_list(lst, el):
        lst.append(el)
        lists_by_element[el] = lst

    def merge_lists(lst1, lst2):
        merged_list = lst1 + lst2
        for el in merged_list:
            lists_by_element[el] = merged_list

    for x, y in pairs:
        xList = lists_by_element.get(x)
        yList = lists_by_element.get(y)

        if not xList and not yList:
            make_new_list_for(x, y)

        if xList and not yList:
            add_element_to_list(xList, y)

        if yList and not xList:
            add_element_to_list(yList, x)            

        if xList and yList and xList != yList:
            merge_lists(xList, yList)

    # return the unique lists present in the dictionary
    return set(tuple(l) for l in lists_by_element.values())

这是它的工作原理:http://ideone.com/tz9t7m

另一个比 wOlf 更紧凑但处理合并的解决方案与 Eli 的相反:

def connected_components(pairs):
    components = []
    for a, b in pairs:
        for component in components:
            if a in component:
                for i, other_component in enumerate(components):
                    if b in other_component and other_component != component: # a, and b are already in different components: merge
                        component.extend(other_component)
                        components[i:i+1] = []
                        break # we don't have to look for other components for b
                else: # b wasn't found in any other component
                    if b not in component:
                        component.append(b)
                break # we don't have to look for other components for a
            if b in component: # a wasn't in in the component 
                component.append(a)
                break # we don't have to look further
        else: # neither a nor b were found
            components.append([a, b])
    return components

请注意,当我在组件中找到元素时,我依赖于跳出循环,以便我可以使用循环的 else 子句来处理元素尚未在任何组件中的情况(如果循环结束时没有 break,则执行 else)。

我提出了 2 种不同的解决方案:

我更喜欢的第一个是将每条记录与父项链接起来。然后当然在层次结构中进一步导航,直到元素映射到自身。


所以代码是:

def build_mapping(input_pairs):
    mapping = {}

    for pair in input_pairs:
        left = pair[0]
        right = pair[1]

        parent_left = None if left not in mapping else mapping[left]
        parent_right = None if right not in mapping else mapping[right]

        if parent_left is None and parent_right is None:
            mapping[left] = left
            mapping[right] = left

            continue

        if parent_left is not None and parent_right is not None:
            if parent_left == parent_right:
                continue

            top_left_parent = mapping[parent_left]
            top_right_parent = mapping[parent_right]
            while top_left_parent != mapping[top_left_parent]:
                mapping[left] = top_left_parent
                top_left_parent = mapping[top_left_parent]

            mapping[top_left_parent] = top_right_parent
            mapping[left] = top_right_parent

            continue 

        if parent_left is None:
            mapping[left] = parent_right
        else:
            mapping[right] = parent_left

    return mapping


def get_groups(input_pairs):
    mapping = build_mapping(input_pairs)

    groups = {}
    for elt, parent in mapping.items():
        if parent not in groups:
            groups[parent] = set()

        groups[parent].add(elt)

    return list(groups.values())

因此,使用以下输入:

groups = get_groups([('A', 'B'), ('A', 'C'), ('D', 'A'), ('E', 'F'), 
                     ('F', 'C'), ('G', 'H'), ('I', 'J'), ('K', 'L'), 
                     ('L', 'M'), ('M', 'N')])

我们得到:

[{'A', 'B', 'C', 'D', 'E', 'F'}, {'G', 'H'}, {'I', 'J'}, {'K', 'L', 'M', 'N'}]

第二种可能效率较低的解决方案是:

def get_groups_second_method(input_pairs):
    groups = []

    for pair in input_pairs:
        left = pair[0]
        right = pair[1]

        left_group = None
        right_group = None
        for i in range(0, len(groups)):
            group = groups[i]

            if left in group:
                left_group = (group, i)

            if right in group:
                right_group = (group, i)

        if left_group is not None and right_group is not None:
            merged = right_group[0].union(left_group[0])
            groups[right_group[1]] = merged
            groups.pop(left_group[1])
            continue

        if left_group is None and right_group is None:
            new_group = {left, right}
            groups.append(new_group)
            continue

        if left_group is None:
            right_group[0].add(left)
        else:
            left_group[0].add(right)

    return groups

您也可以使用 networkx 作为依赖项。

import networkx as nx

pairs = [(1,62),
        (1,192),
        (1,168),
        (64,449),
        (263,449),      
        (192,289),
        (128,263),
        (128,345),
        (3,10),
        (10,11)]


G = nx.Graph()
G.add_edges_from(pairs)
list(nx.connected_components(G))