使用相对位置数据对列表进行排序

Sorting a List with Relative Positional Data

这更像是一个概念性的编程问题,所以请耐心等待:

假设您有一个电影场景列表,每个场景可能会也可能不会引用同一部电影中的 past/future 个场景。我正在尝试找到对这些场景进行排序的最有效算法。当然,可能没有足够的信息让场景完全排序。

下面是 Python 中的一些示例代码(几乎是伪代码)以阐明:

class Reference:
    def __init__(self, scene_id, relation):
        self.scene_id = scene_id
        self.relation = relation


class Scene:
    def __init__(self, scene_id, references):
        self.id = scene_id
        self.references = references

    def __repr__(self):
        return self.id


def relative_sort(scenes):
    return scenes # Algorithm in question


def main():
    s1 = Scene('s1', [
        Reference('s3', 'after')
    ])
    s2 = Scene('s2', [
        Reference('s1', 'before'),
        Reference('s4', 'after')
    ])
    s3 = Scene('s3', [
        Reference('s4', 'after')
    ])
    s4 = Scene('s4', [
        Reference('s2', 'before')
    ])

    print relative_sort([s1, s2, s3, s4])


if __name__ == '__main__':
    main()

在这种情况下,目标是 relative_sort return [s4, s3, s2, s1]

如果有帮助,我可以分享我对算法的初步尝试;我对它的蛮力程度感到有点尴尬。另外,如果你想知道,我正在尝试解码电影的情节 "Mulholland Drive"。

仅供参考:Python 标签只在这里是因为我的伪代码是在 Python.

中编写的

我已将您修改后的代码包含在我的答案中,它解决了当前的(小)问题,但如果没有更大的样本问题,我不确定它的扩展性如何。如果您提供您要解决的实际问题,我很乐意测试和改进此代码,直到它可以解决该问题,但如果没有测试数据,我将不会进一步优化此解决方案。

首先,我们将引用作为集合而不是列表来跟踪。

  • 重复并没有真正帮助我们(如果 "s1" 在 "s2" 之前,"s1" 在 "s2" 之前,我们没有获得任何信息)
  • 这也让我们可以添加带有放弃的反向引用(如果 "s1" 在 "s2" 之前,那么 "s2" 在 "s1" 之后)。

我们计算最小和最大位置:

  • 最小位置取决于我们追求的场景数量
  • 这很容易扩展:如果我们在两个 min_pos 为 2 的场景之后,我们的 min_pos 是 4(如果一个是 2,另一个必须是 3)
  • 最大位置取决于我们之前有多少东西
  • 这可以类似地扩展:如果我们在两个max_pos为4的场景之前,我们的max_pos是2(如果一个是4,另一个必须是3)
  • 如果您决定这样做,只需将 tighten_bounds(self) 中的 pass 替换为代码以尝试收紧单个场景的边界(如果可行,请将 anything_updated 设置为 true ).

魔法就在get_possible_orders

  • 如果您对其进行迭代,则生成所有有效的顺序
  • 如果您只想要一个有效的订单,则不需要花时间创建所有订单

代码:

class Reference:
    def __init__(self, scene_id, relation):
        self.scene_id = scene_id
        self.relation = relation

    def __repr__(self):
        return '"%s %s"' % (self.relation, self.scene_id)

    def __hash__(self):
        return hash(self.scene_id)

    def __eq__(self, other):
        return self.scene_id == other.scene_id and self.relation == other.relation


class Scene:
    def __init__(self, title, references):
        self.title = title
        self.references = references
        self.min_pos = 0
        self.max_pos = None

    def __repr__(self):
        return '%s (%s,%s)' % (self.title, self.min_pos, self.max_pos)

inverse_relation = {'before': 'after', 'after': 'before'}


def inverted_reference(scene, reference):
    return Reference(scene.title, inverse_relation[reference.relation])


def is_valid_addition(scenes_so_far, new_scene, scenes_to_go):
    previous_ids = {s.title for s in scenes_so_far}
    future_ids = {s.title for s in scenes_to_go}
    for ref in new_scene.references:
        if ref.relation == 'before' and ref.scene_id in previous_ids:
            return False
        elif ref.relation == 'after' and ref.scene_id in future_ids:
            return False
    return True


class Movie:
    def __init__(self, scene_list):
        self.num_scenes = len(scene_list)
        self.scene_dict = {scene.title: scene for scene in scene_list}
        self.set_max_positions()
        self.add_inverse_relations()
        self.bound_min_max_pos()
        self.can_tighten = True
        while self.can_tighten:
            self.tighten_bounds()

    def set_max_positions(self):
        for scene in self.scene_dict.values():
            scene.max_pos = self.num_scenes - 1

    def add_inverse_relations(self):
        for scene in self.scene_dict.values():
            for ref in scene.references:
                self.scene_dict[ref.scene_id].references.add(inverted_reference(scene, ref))

    def bound_min_max_pos(self):
        for scene in self.scene_dict.values():
            for ref in scene.references:
                if ref.relation == 'before':
                    scene.max_pos -= 1
                elif ref.relation == 'after':
                    scene.min_pos += 1

    def tighten_bounds(self):
        anything_updated = False
        for scene in self.scene_dict.values():
            pass
            # If bounds for any scene are tightened, set anything_updated back to true
        self.can_tighten = anything_updated

    def get_possible_orders(self, scenes_so_far):
        if len(scenes_so_far) == self.num_scenes:
            yield scenes_so_far
            raise StopIteration
        n = len(scenes_so_far)
        scenes_left = set(self.scene_dict.values()) - set(scenes_so_far)
        valid_next_scenes = set(s
                                for s in scenes_left
                                if s.min_pos <= n <= s.max_pos)
        # valid_next_scenes = sorted(valid_next_scenes, key=lambda s: s.min_pos * self.num_scenes + s.max_pos)
        for s in valid_next_scenes:
            if is_valid_addition(scenes_so_far, s, scenes_left - {s}):
                for valid_complete_sequence in self.get_possible_orders(scenes_so_far + (s,)):
                    yield valid_complete_sequence

    def get_possible_order(self):
        return self.get_possible_orders(tuple()).__next__()


def relative_sort(lst):
    try:
        return [s.title for s in Movie(lst).get_possible_order()]
    except StopIteration:
        return None


def main():
    s1 = Scene('s1', {Reference('s3', 'after')})
    s2 = Scene('s2', {
        Reference('s1', 'before'),
        Reference('s4', 'after')
    })
    s3 = Scene('s3', {
        Reference('s4', 'after')
    })
    s4 = Scene('s4', {
        Reference('s2', 'before')
    })

    print(relative_sort([s1, s2, s3, s4]))


if __name__ == '__main__':
    main()

您要找的算​​法是topological sort:

In the field of computer science, a topological sort or topological ordering of a directed graph is a linear ordering of its vertices such that for every directed edge uv from vertex u to vertex v, u comes before v in the ordering. For instance, the vertices of the graph may represent tasks to be performed, and the edges may represent constraints that one task must be performed before another; in this application, a topological ordering is just a valid sequence for the tasks.

您可以使用图形库非常轻松地进行计算,例如 networkx,它实现了 topological_sort。首先我们导入库并列出场景之间的所有关系——即图中所有有向边

>>> import networkx as nx
>>> relations = [
    (3, 1),  # 1 after 3
    (2, 1),  # 2 before 1
    (4, 2),  # 2 after 4
    (4, 3),  # 3 after 4
    (4, 2)   # 4 before 2
]

然后我们创建一个有向图:

>>> g = nx.DiGraph(relations)

然后我们运行一个拓扑排序:

>>> nx.topological_sort(g)
[4, 3, 2, 1]

正如其他人所指出的,您需要进行拓扑排序。您只需要对有向图进行深度优先遍历,其中顺序关系形成边。按 post 顺序访问。这与拓扑排序相反。所以要得到拓扑排序,只需将结果反转即可。

我已将您的数据编码为成对列表,显示已知在什么之前发生的事情。这只是为了让我的代码简短。您可以轻松遍历 类 列表来创建图表。

请注意,拓扑排序要有意义,排序的集合必须满足 partial order 的定义。你的很好。时间事件的顺序约束自然满足定义。

请注意,完全可以创建带循环的图形。没有 topo 这样的图表。此实现不检测循环,但修改它很容易做到这一点。

当然可以使用库来进行拓扑排序,但是那有什么乐趣呢?

from collections import defaultdict

# Before -> After pairs dictating order. Repeats are okay. Cycles aren't.
# This is OP's data in a friendlier form.
OrderRelation = [('s3','s1'), ('s2','s1'), ('s4','s2'), ('s4','s3'), ('s4','s2')]

class OrderGraph:
  # nodes is an optional list of items for use when some aren't related at all
  def __init__(self, relation, nodes=[]):
    self.succ = defaultdict(set) # Successor map
    heads = set()
    for tail, head in relation:
      self.succ[tail].add(head)
      heads.add(head)
    # Sources are nodes that have no in-edges (tails - heads)
    self.sources = set(self.succ.keys()) - heads | set(nodes)

  # Recursive helper to traverse the graph and visit in post order
  def __traverse(self, start):
    if start in self.visited: return
    self.visited.add(start)
    for succ in self.succ[start]: self.__traverse(succ)
    self.sorted.append(start) # Append in post-order

  # Return a reverse post-order visit, which is a topo sort. Not thread safe.
  def topoSort(self):
    self.visited = set()
    self.sorted = []
    for source in self.sources: self.__traverse(source)
    self.sorted.reverse()
    return self.sorted

然后...

>>> print OrderGraph(OrderRelation).topoSort()
['s4', 's2', 's3', 's1']

>>> print OrderGraph(OrderRelation, ['s1', 'unordered']).topoSort()
['s4', 's2', 's3', 'unordered', 's1']

第二次调用表明您可以选择传递要在单独列表中排序的值。您可能但没有在关系对中提及值。当然,那些未在顺序对中提及的可以自由出现在输出中的任何位置。