使用相对位置数据对列表进行排序
Sorting a List with Relative Positional Data
这更像是一个概念性的编程问题,所以请耐心等待:
假设您有一个电影场景列表,每个场景可能会也可能不会引用同一部电影中的 past/future 个场景。我正在尝试找到对这些场景进行排序的最有效算法。当然,可能没有足够的信息让场景完全排序。
下面是 Python 中的一些示例代码(几乎是伪代码)以阐明:
class Reference:
def __init__(self, scene_id, relation):
self.scene_id = scene_id
self.relation = relation
class Scene:
def __init__(self, scene_id, references):
self.id = scene_id
self.references = references
def __repr__(self):
return self.id
def relative_sort(scenes):
return scenes # Algorithm in question
def main():
s1 = Scene('s1', [
Reference('s3', 'after')
])
s2 = Scene('s2', [
Reference('s1', 'before'),
Reference('s4', 'after')
])
s3 = Scene('s3', [
Reference('s4', 'after')
])
s4 = Scene('s4', [
Reference('s2', 'before')
])
print relative_sort([s1, s2, s3, s4])
if __name__ == '__main__':
main()
在这种情况下,目标是 relative_sort
return [s4, s3, s2, s1]
。
如果有帮助,我可以分享我对算法的初步尝试;我对它的蛮力程度感到有点尴尬。另外,如果你想知道,我正在尝试解码电影的情节 "Mulholland Drive"。
仅供参考:Python 标签只在这里是因为我的伪代码是在 Python.
中编写的
我已将您修改后的代码包含在我的答案中,它解决了当前的(小)问题,但如果没有更大的样本问题,我不确定它的扩展性如何。如果您提供您要解决的实际问题,我很乐意测试和改进此代码,直到它可以解决该问题,但如果没有测试数据,我将不会进一步优化此解决方案。
首先,我们将引用作为集合而不是列表来跟踪。
- 重复并没有真正帮助我们(如果 "s1" 在 "s2" 之前,"s1" 在 "s2" 之前,我们没有获得任何信息)
- 这也让我们可以添加带有放弃的反向引用(如果 "s1" 在 "s2" 之前,那么 "s2" 在 "s1" 之后)。
我们计算最小和最大位置:
- 最小位置取决于我们追求的场景数量
- 这很容易扩展:如果我们在两个 min_pos 为 2 的场景之后,我们的 min_pos 是 4(如果一个是 2,另一个必须是 3)
- 最大位置取决于我们之前有多少东西
- 这可以类似地扩展:如果我们在两个max_pos为4的场景之前,我们的max_pos是2(如果一个是4,另一个必须是3)
- 如果您决定这样做,只需将
tighten_bounds(self)
中的 pass
替换为代码以尝试收紧单个场景的边界(如果可行,请将 anything_updated 设置为 true ).
魔法就在get_possible_orders
- 如果您对其进行迭代,则生成所有有效的顺序
- 如果您只想要一个有效的订单,则不需要花时间创建所有订单
代码:
class Reference:
def __init__(self, scene_id, relation):
self.scene_id = scene_id
self.relation = relation
def __repr__(self):
return '"%s %s"' % (self.relation, self.scene_id)
def __hash__(self):
return hash(self.scene_id)
def __eq__(self, other):
return self.scene_id == other.scene_id and self.relation == other.relation
class Scene:
def __init__(self, title, references):
self.title = title
self.references = references
self.min_pos = 0
self.max_pos = None
def __repr__(self):
return '%s (%s,%s)' % (self.title, self.min_pos, self.max_pos)
inverse_relation = {'before': 'after', 'after': 'before'}
def inverted_reference(scene, reference):
return Reference(scene.title, inverse_relation[reference.relation])
def is_valid_addition(scenes_so_far, new_scene, scenes_to_go):
previous_ids = {s.title for s in scenes_so_far}
future_ids = {s.title for s in scenes_to_go}
for ref in new_scene.references:
if ref.relation == 'before' and ref.scene_id in previous_ids:
return False
elif ref.relation == 'after' and ref.scene_id in future_ids:
return False
return True
class Movie:
def __init__(self, scene_list):
self.num_scenes = len(scene_list)
self.scene_dict = {scene.title: scene for scene in scene_list}
self.set_max_positions()
self.add_inverse_relations()
self.bound_min_max_pos()
self.can_tighten = True
while self.can_tighten:
self.tighten_bounds()
def set_max_positions(self):
for scene in self.scene_dict.values():
scene.max_pos = self.num_scenes - 1
def add_inverse_relations(self):
for scene in self.scene_dict.values():
for ref in scene.references:
self.scene_dict[ref.scene_id].references.add(inverted_reference(scene, ref))
def bound_min_max_pos(self):
for scene in self.scene_dict.values():
for ref in scene.references:
if ref.relation == 'before':
scene.max_pos -= 1
elif ref.relation == 'after':
scene.min_pos += 1
def tighten_bounds(self):
anything_updated = False
for scene in self.scene_dict.values():
pass
# If bounds for any scene are tightened, set anything_updated back to true
self.can_tighten = anything_updated
def get_possible_orders(self, scenes_so_far):
if len(scenes_so_far) == self.num_scenes:
yield scenes_so_far
raise StopIteration
n = len(scenes_so_far)
scenes_left = set(self.scene_dict.values()) - set(scenes_so_far)
valid_next_scenes = set(s
for s in scenes_left
if s.min_pos <= n <= s.max_pos)
# valid_next_scenes = sorted(valid_next_scenes, key=lambda s: s.min_pos * self.num_scenes + s.max_pos)
for s in valid_next_scenes:
if is_valid_addition(scenes_so_far, s, scenes_left - {s}):
for valid_complete_sequence in self.get_possible_orders(scenes_so_far + (s,)):
yield valid_complete_sequence
def get_possible_order(self):
return self.get_possible_orders(tuple()).__next__()
def relative_sort(lst):
try:
return [s.title for s in Movie(lst).get_possible_order()]
except StopIteration:
return None
def main():
s1 = Scene('s1', {Reference('s3', 'after')})
s2 = Scene('s2', {
Reference('s1', 'before'),
Reference('s4', 'after')
})
s3 = Scene('s3', {
Reference('s4', 'after')
})
s4 = Scene('s4', {
Reference('s2', 'before')
})
print(relative_sort([s1, s2, s3, s4]))
if __name__ == '__main__':
main()
您要找的算法是topological sort:
In the field of computer science, a topological sort or topological ordering of a directed graph is a linear ordering of its vertices such that for every directed edge uv from vertex u to vertex v, u comes before v in the ordering. For instance, the vertices of the graph may represent tasks to be performed, and the edges may represent constraints that one task must be performed before another; in this application, a topological ordering is just a valid sequence for the tasks.
您可以使用图形库非常轻松地进行计算,例如 networkx
,它实现了 topological_sort
。首先我们导入库并列出场景之间的所有关系——即图中所有有向边
>>> import networkx as nx
>>> relations = [
(3, 1), # 1 after 3
(2, 1), # 2 before 1
(4, 2), # 2 after 4
(4, 3), # 3 after 4
(4, 2) # 4 before 2
]
然后我们创建一个有向图:
>>> g = nx.DiGraph(relations)
然后我们运行一个拓扑排序:
>>> nx.topological_sort(g)
[4, 3, 2, 1]
正如其他人所指出的,您需要进行拓扑排序。您只需要对有向图进行深度优先遍历,其中顺序关系形成边。按 post 顺序访问。这与拓扑排序相反。所以要得到拓扑排序,只需将结果反转即可。
我已将您的数据编码为成对列表,显示已知在什么之前发生的事情。这只是为了让我的代码简短。您可以轻松遍历 类 列表来创建图表。
请注意,拓扑排序要有意义,排序的集合必须满足 partial order 的定义。你的很好。时间事件的顺序约束自然满足定义。
请注意,完全可以创建带循环的图形。没有 topo 这样的图表。此实现不检测循环,但修改它很容易做到这一点。
当然可以使用库来进行拓扑排序,但是那有什么乐趣呢?
from collections import defaultdict
# Before -> After pairs dictating order. Repeats are okay. Cycles aren't.
# This is OP's data in a friendlier form.
OrderRelation = [('s3','s1'), ('s2','s1'), ('s4','s2'), ('s4','s3'), ('s4','s2')]
class OrderGraph:
# nodes is an optional list of items for use when some aren't related at all
def __init__(self, relation, nodes=[]):
self.succ = defaultdict(set) # Successor map
heads = set()
for tail, head in relation:
self.succ[tail].add(head)
heads.add(head)
# Sources are nodes that have no in-edges (tails - heads)
self.sources = set(self.succ.keys()) - heads | set(nodes)
# Recursive helper to traverse the graph and visit in post order
def __traverse(self, start):
if start in self.visited: return
self.visited.add(start)
for succ in self.succ[start]: self.__traverse(succ)
self.sorted.append(start) # Append in post-order
# Return a reverse post-order visit, which is a topo sort. Not thread safe.
def topoSort(self):
self.visited = set()
self.sorted = []
for source in self.sources: self.__traverse(source)
self.sorted.reverse()
return self.sorted
然后...
>>> print OrderGraph(OrderRelation).topoSort()
['s4', 's2', 's3', 's1']
>>> print OrderGraph(OrderRelation, ['s1', 'unordered']).topoSort()
['s4', 's2', 's3', 'unordered', 's1']
第二次调用表明您可以选择传递要在单独列表中排序的值。您可能但没有在关系对中提及值。当然,那些未在顺序对中提及的可以自由出现在输出中的任何位置。
这更像是一个概念性的编程问题,所以请耐心等待:
假设您有一个电影场景列表,每个场景可能会也可能不会引用同一部电影中的 past/future 个场景。我正在尝试找到对这些场景进行排序的最有效算法。当然,可能没有足够的信息让场景完全排序。
下面是 Python 中的一些示例代码(几乎是伪代码)以阐明:
class Reference:
def __init__(self, scene_id, relation):
self.scene_id = scene_id
self.relation = relation
class Scene:
def __init__(self, scene_id, references):
self.id = scene_id
self.references = references
def __repr__(self):
return self.id
def relative_sort(scenes):
return scenes # Algorithm in question
def main():
s1 = Scene('s1', [
Reference('s3', 'after')
])
s2 = Scene('s2', [
Reference('s1', 'before'),
Reference('s4', 'after')
])
s3 = Scene('s3', [
Reference('s4', 'after')
])
s4 = Scene('s4', [
Reference('s2', 'before')
])
print relative_sort([s1, s2, s3, s4])
if __name__ == '__main__':
main()
在这种情况下,目标是 relative_sort
return [s4, s3, s2, s1]
。
如果有帮助,我可以分享我对算法的初步尝试;我对它的蛮力程度感到有点尴尬。另外,如果你想知道,我正在尝试解码电影的情节 "Mulholland Drive"。
仅供参考:Python 标签只在这里是因为我的伪代码是在 Python.
中编写的我已将您修改后的代码包含在我的答案中,它解决了当前的(小)问题,但如果没有更大的样本问题,我不确定它的扩展性如何。如果您提供您要解决的实际问题,我很乐意测试和改进此代码,直到它可以解决该问题,但如果没有测试数据,我将不会进一步优化此解决方案。
首先,我们将引用作为集合而不是列表来跟踪。
- 重复并没有真正帮助我们(如果 "s1" 在 "s2" 之前,"s1" 在 "s2" 之前,我们没有获得任何信息)
- 这也让我们可以添加带有放弃的反向引用(如果 "s1" 在 "s2" 之前,那么 "s2" 在 "s1" 之后)。
我们计算最小和最大位置:
- 最小位置取决于我们追求的场景数量
- 这很容易扩展:如果我们在两个 min_pos 为 2 的场景之后,我们的 min_pos 是 4(如果一个是 2,另一个必须是 3)
- 最大位置取决于我们之前有多少东西
- 这可以类似地扩展:如果我们在两个max_pos为4的场景之前,我们的max_pos是2(如果一个是4,另一个必须是3)
- 如果您决定这样做,只需将
tighten_bounds(self)
中的pass
替换为代码以尝试收紧单个场景的边界(如果可行,请将 anything_updated 设置为 true ).
魔法就在get_possible_orders
- 如果您对其进行迭代,则生成所有有效的顺序
- 如果您只想要一个有效的订单,则不需要花时间创建所有订单
代码:
class Reference:
def __init__(self, scene_id, relation):
self.scene_id = scene_id
self.relation = relation
def __repr__(self):
return '"%s %s"' % (self.relation, self.scene_id)
def __hash__(self):
return hash(self.scene_id)
def __eq__(self, other):
return self.scene_id == other.scene_id and self.relation == other.relation
class Scene:
def __init__(self, title, references):
self.title = title
self.references = references
self.min_pos = 0
self.max_pos = None
def __repr__(self):
return '%s (%s,%s)' % (self.title, self.min_pos, self.max_pos)
inverse_relation = {'before': 'after', 'after': 'before'}
def inverted_reference(scene, reference):
return Reference(scene.title, inverse_relation[reference.relation])
def is_valid_addition(scenes_so_far, new_scene, scenes_to_go):
previous_ids = {s.title for s in scenes_so_far}
future_ids = {s.title for s in scenes_to_go}
for ref in new_scene.references:
if ref.relation == 'before' and ref.scene_id in previous_ids:
return False
elif ref.relation == 'after' and ref.scene_id in future_ids:
return False
return True
class Movie:
def __init__(self, scene_list):
self.num_scenes = len(scene_list)
self.scene_dict = {scene.title: scene for scene in scene_list}
self.set_max_positions()
self.add_inverse_relations()
self.bound_min_max_pos()
self.can_tighten = True
while self.can_tighten:
self.tighten_bounds()
def set_max_positions(self):
for scene in self.scene_dict.values():
scene.max_pos = self.num_scenes - 1
def add_inverse_relations(self):
for scene in self.scene_dict.values():
for ref in scene.references:
self.scene_dict[ref.scene_id].references.add(inverted_reference(scene, ref))
def bound_min_max_pos(self):
for scene in self.scene_dict.values():
for ref in scene.references:
if ref.relation == 'before':
scene.max_pos -= 1
elif ref.relation == 'after':
scene.min_pos += 1
def tighten_bounds(self):
anything_updated = False
for scene in self.scene_dict.values():
pass
# If bounds for any scene are tightened, set anything_updated back to true
self.can_tighten = anything_updated
def get_possible_orders(self, scenes_so_far):
if len(scenes_so_far) == self.num_scenes:
yield scenes_so_far
raise StopIteration
n = len(scenes_so_far)
scenes_left = set(self.scene_dict.values()) - set(scenes_so_far)
valid_next_scenes = set(s
for s in scenes_left
if s.min_pos <= n <= s.max_pos)
# valid_next_scenes = sorted(valid_next_scenes, key=lambda s: s.min_pos * self.num_scenes + s.max_pos)
for s in valid_next_scenes:
if is_valid_addition(scenes_so_far, s, scenes_left - {s}):
for valid_complete_sequence in self.get_possible_orders(scenes_so_far + (s,)):
yield valid_complete_sequence
def get_possible_order(self):
return self.get_possible_orders(tuple()).__next__()
def relative_sort(lst):
try:
return [s.title for s in Movie(lst).get_possible_order()]
except StopIteration:
return None
def main():
s1 = Scene('s1', {Reference('s3', 'after')})
s2 = Scene('s2', {
Reference('s1', 'before'),
Reference('s4', 'after')
})
s3 = Scene('s3', {
Reference('s4', 'after')
})
s4 = Scene('s4', {
Reference('s2', 'before')
})
print(relative_sort([s1, s2, s3, s4]))
if __name__ == '__main__':
main()
您要找的算法是topological sort:
In the field of computer science, a topological sort or topological ordering of a directed graph is a linear ordering of its vertices such that for every directed edge uv from vertex u to vertex v, u comes before v in the ordering. For instance, the vertices of the graph may represent tasks to be performed, and the edges may represent constraints that one task must be performed before another; in this application, a topological ordering is just a valid sequence for the tasks.
您可以使用图形库非常轻松地进行计算,例如 networkx
,它实现了 topological_sort
。首先我们导入库并列出场景之间的所有关系——即图中所有有向边
>>> import networkx as nx
>>> relations = [
(3, 1), # 1 after 3
(2, 1), # 2 before 1
(4, 2), # 2 after 4
(4, 3), # 3 after 4
(4, 2) # 4 before 2
]
然后我们创建一个有向图:
>>> g = nx.DiGraph(relations)
然后我们运行一个拓扑排序:
>>> nx.topological_sort(g)
[4, 3, 2, 1]
正如其他人所指出的,您需要进行拓扑排序。您只需要对有向图进行深度优先遍历,其中顺序关系形成边。按 post 顺序访问。这与拓扑排序相反。所以要得到拓扑排序,只需将结果反转即可。
我已将您的数据编码为成对列表,显示已知在什么之前发生的事情。这只是为了让我的代码简短。您可以轻松遍历 类 列表来创建图表。
请注意,拓扑排序要有意义,排序的集合必须满足 partial order 的定义。你的很好。时间事件的顺序约束自然满足定义。
请注意,完全可以创建带循环的图形。没有 topo 这样的图表。此实现不检测循环,但修改它很容易做到这一点。
当然可以使用库来进行拓扑排序,但是那有什么乐趣呢?
from collections import defaultdict
# Before -> After pairs dictating order. Repeats are okay. Cycles aren't.
# This is OP's data in a friendlier form.
OrderRelation = [('s3','s1'), ('s2','s1'), ('s4','s2'), ('s4','s3'), ('s4','s2')]
class OrderGraph:
# nodes is an optional list of items for use when some aren't related at all
def __init__(self, relation, nodes=[]):
self.succ = defaultdict(set) # Successor map
heads = set()
for tail, head in relation:
self.succ[tail].add(head)
heads.add(head)
# Sources are nodes that have no in-edges (tails - heads)
self.sources = set(self.succ.keys()) - heads | set(nodes)
# Recursive helper to traverse the graph and visit in post order
def __traverse(self, start):
if start in self.visited: return
self.visited.add(start)
for succ in self.succ[start]: self.__traverse(succ)
self.sorted.append(start) # Append in post-order
# Return a reverse post-order visit, which is a topo sort. Not thread safe.
def topoSort(self):
self.visited = set()
self.sorted = []
for source in self.sources: self.__traverse(source)
self.sorted.reverse()
return self.sorted
然后...
>>> print OrderGraph(OrderRelation).topoSort()
['s4', 's2', 's3', 's1']
>>> print OrderGraph(OrderRelation, ['s1', 'unordered']).topoSort()
['s4', 's2', 's3', 'unordered', 's1']
第二次调用表明您可以选择传递要在单独列表中排序的值。您可能但没有在关系对中提及值。当然,那些未在顺序对中提及的可以自由出现在输出中的任何位置。