Python Dijkstra 算法实现中的冗余检查
Redundant Checks in Python Implementation of Dijkstra's Algorithm
编辑:我添加了一些输出以突出显示我认为的问题所在。
Dijkstra 算法有很多版本,当您学习时很难评估它们的质量。
下面的实现似乎来自信誉良好的来源 (https://bradfieldcs.com/algos/graphs/dijkstras-algorithm/)
但是,似乎由于此版本不跟踪已访问的节点,所以这些行:
for neighbor, weight in graph[current_vertex].items():
distance = current_distance + weight
可能会产生大量不必要的检查。
下面代码的输出是:
Neighbor: V, weight: 6, distance: 6
Neighbor: W, weight: 7, distance: 7
Neighbor: U, weight: 6, distance: 12
Neighbor: X, weight: 10, distance: 16
Neighbor: U, weight: 7, distance: 14
Neighbor: X, weight: 1, distance: 8
Neighbor: W, weight: 1, distance: 9
Neighbor: V, weight: 10, distance: 18
{'U': 0, 'V': 6, 'W': 7, 'X': 8}
对我来说,这表明该算法正在做不必要的工作,因为节点 U
多次成为邻居,其距离计算为已计算距离的两倍,因此被拒绝。我的理解是,一旦一个节点被处理,就不需要再考虑了。我可能误解了算法,但这对我来说看起来很可疑。
既然跟踪访问过的节点似乎是 Dijkstra 算法定义的组成部分,那么可以说这个特定的实现“不太好”吗?还是我遗漏了什么?
很高兴在 Python 中看到 Dijkstra 算法的“最佳实践”版本,最好对图使用相同类型的结构。
import heapq
def calculate_distances(graph, starting_vertex):
distances = {vertex: float('infinity') for vertex in graph}
distances[starting_vertex] = 0
pq = [(0, starting_vertex)]
while len(pq) > 0:
current_distance, current_vertex = heapq.heappop(pq)
# Nodes can get added to the priority queue multiple times. We only
# process a vertex the first time we remove it from the priority queue.
if current_distance > distances[current_vertex]:
continue
for neighbor, weight in graph[current_vertex].items():
distance = current_distance + weight
print(f"Neighbor: {neighbor}, weight: {weight}, distance: {distance}")
# Only consider this new path if it's better than any path we've
# already found.
if distance < distances[neighbor]:
distances[neighbor] = distance
heapq.heappush(pq, (distance, neighbor))
return distances
example_graph = {
'U': {'V': 6, 'W': 7},
'V': {'U': 6, 'X': 10},
'W': {'U': 7, 'X': 1},
'X': {'W': 1, 'V': 10}
}
print(calculate_distances(example_graph, 'U'))
My understanding is that once a node is processed, it no longer needs to be considered.
如果你的意思是“考虑”是计算了它沿路径的距离,那么这是真的,但也要考虑到将距离与迄今为止的最佳值进行比较并不比检查邻居是否被计算复杂得多已经访问过。在任何一种情况下(算法),真正 visited 节点(即已经从堆中 popped 的节点)将永远不会被推入堆再次.
让我们看一下该算法的一个变体,其中(仅)使用“已访问”的概念来确定是否应将邻居放入堆中。我有意尝试限制代码更改,以便更好地突出差异:
INF = float('infinity')
def calculate_distances_2(graph, starting_vertex):
distances = {vertex: INF for vertex in graph}
pq = [(0, starting_vertex)]
while len(pq) > 0:
current_distance, current_vertex = heapq.heappop(pq)
if distances[current_vertex] != INF: # Already visited?
continue
distances[current_vertex] = current_distance
for neighbor, weight in graph[current_vertex].items():
print(f"Neighbor: {neighbor}, weight: {weight}, goes on heap? {distances[neighbor] == INF}")
if distances[neighbor] == INF: # Not yet visited?
heapq.heappush(pq, (current_distance + weight, neighbor))
return distances
那么这里有什么不同呢?
节点的距离仅在节点从堆中弹出时设置,这也用于将节点标记为已访问:它不再具有 Infinity 作为关联距离。这意味着:
- 我们不在循环开始前设置
distances[starting_vertex] = 0
。
- 我们只检查邻居是否被访问过(隐含地,通过检查
distances[starting_vertex]
是否为Infinity),但不比较当前邻居的距离是 改进 。现在这完全留给堆机制
当节点已经访问时,不必计算邻居沿当前路径的距离。
第一点实际上意味着第二种算法可能会(再次)将节点推入堆中,而第一种算法可能不会。在最坏的情况下没有差异,但在随机情况下我们可以预期会发生这种差异。这是因为第一个算法使用了更多的信息:当同一个节点已经在堆上出现一次或多次时,第一个算法知道到该节点的遍历路径中的 最短 距离,而第二个算法“只”知道这个节点还没有被访问过(即还没有被弹出)。
具体例子
对于您的示例,没有区别。我试过这张图:
...并使用下面的代码进行比较。请注意,我更改了您的 print
调用:我删除了 distance
的输出(因为在第二种算法中尚未计算),并添加了更多信息:邻居是否将被推入堆中与否 (False/True):
import heapq
INF = float('infinity')
def calculate_distances(graph, starting_vertex):
distances = {vertex: INF for vertex in graph}
distances[starting_vertex] = 0
pq = [(0, starting_vertex)]
while len(pq) > 0:
current_distance, current_vertex = heapq.heappop(pq)
if current_distance > distances[current_vertex]:
continue
for neighbor, weight in graph[current_vertex].items():
distance = current_distance + weight
print(f"Neighbor: {neighbor}, weight: {weight}, goes on heap?: {distance < distances[neighbor]}")
if distance < distances[neighbor]:
distances[neighbor] = distance
heapq.heappush(pq, (distance, neighbor))
return distances
### Alternative
def calculate_distances_2(graph, starting_vertex):
distances = {vertex: INF for vertex in graph}
pq = [(0, starting_vertex)]
while len(pq) > 0:
current_distance, current_vertex = heapq.heappop(pq)
if distances[current_vertex] != INF:
continue
distances[current_vertex] = current_distance
for neighbor, weight in graph[current_vertex].items():
print(f"Neighbor: {neighbor}, weight: {weight}, goes on heap? {distances[neighbor] == INF}")
if distances[neighbor] == INF:
heapq.heappush(pq, (current_distance + weight, neighbor))
return distances
example_graph = {
"0": { "1": 2, "2": 6 },
"1": { "0": 2, "3": 5 },
"2": { "0": 6, "3": 8 },
"3": { "1": 5, "2": 8, "4": 10, "5": 15 },
"4": { "3": 10, "5": 6, "6": 2 },
"5": { "3": 15, "4": 6, "6": 6 },
"6": { "4": 2, "5": 6 }
}
print(calculate_distances(example_graph, '0'))
print(calculate_distances_2(example_graph, '0'))
我这里只提供第一种算法生成的输出,并标记第二种算法有不同输出的行:
Neighbor: 1, weight: 2, goes on heap?: True
Neighbor: 2, weight: 6, goes on heap?: True
Neighbor: 0, weight: 2, goes on heap?: False
Neighbor: 3, weight: 5, goes on heap?: True
Neighbor: 0, weight: 6, goes on heap?: False
Neighbor: 3, weight: 8, goes on heap?: False ****
Neighbor: 1, weight: 5, goes on heap?: False
Neighbor: 2, weight: 8, goes on heap?: False
Neighbor: 4, weight: 10, goes on heap?: True
Neighbor: 5, weight: 15, goes on heap?: True
Neighbor: 3, weight: 10, goes on heap?: False
Neighbor: 5, weight: 6, goes on heap?: False ****
Neighbor: 6, weight: 2, goes on heap?: True
Neighbor: 4, weight: 2, goes on heap?: False
Neighbor: 5, weight: 6, goes on heap?: False ****
Neighbor: 3, weight: 15, goes on heap?: False
Neighbor: 4, weight: 6, goes on heap?: False
Neighbor: 6, weight: 6, goes on heap?: False
{'0': 0, '1': 2, '2': 6, '3': 7, '4': 17, '5': 22, '6': 19}
输出不同的地方(3处)表示第一个算法输出False
和第二个True
.
结论
Attribute
First algorithm
Second algorithm
Heap size
Better
Worse
Additions
Worse
Better
堆大小在随机情况下更能决定执行时间,因此第一个算法预计 运行 稍快。
编辑:我添加了一些输出以突出显示我认为的问题所在。
Dijkstra 算法有很多版本,当您学习时很难评估它们的质量。
下面的实现似乎来自信誉良好的来源 (https://bradfieldcs.com/algos/graphs/dijkstras-algorithm/)
但是,似乎由于此版本不跟踪已访问的节点,所以这些行:
for neighbor, weight in graph[current_vertex].items():
distance = current_distance + weight
可能会产生大量不必要的检查。
下面代码的输出是:
Neighbor: V, weight: 6, distance: 6
Neighbor: W, weight: 7, distance: 7
Neighbor: U, weight: 6, distance: 12
Neighbor: X, weight: 10, distance: 16
Neighbor: U, weight: 7, distance: 14
Neighbor: X, weight: 1, distance: 8
Neighbor: W, weight: 1, distance: 9
Neighbor: V, weight: 10, distance: 18
{'U': 0, 'V': 6, 'W': 7, 'X': 8}
对我来说,这表明该算法正在做不必要的工作,因为节点 U
多次成为邻居,其距离计算为已计算距离的两倍,因此被拒绝。我的理解是,一旦一个节点被处理,就不需要再考虑了。我可能误解了算法,但这对我来说看起来很可疑。
既然跟踪访问过的节点似乎是 Dijkstra 算法定义的组成部分,那么可以说这个特定的实现“不太好”吗?还是我遗漏了什么?
很高兴在 Python 中看到 Dijkstra 算法的“最佳实践”版本,最好对图使用相同类型的结构。
import heapq
def calculate_distances(graph, starting_vertex):
distances = {vertex: float('infinity') for vertex in graph}
distances[starting_vertex] = 0
pq = [(0, starting_vertex)]
while len(pq) > 0:
current_distance, current_vertex = heapq.heappop(pq)
# Nodes can get added to the priority queue multiple times. We only
# process a vertex the first time we remove it from the priority queue.
if current_distance > distances[current_vertex]:
continue
for neighbor, weight in graph[current_vertex].items():
distance = current_distance + weight
print(f"Neighbor: {neighbor}, weight: {weight}, distance: {distance}")
# Only consider this new path if it's better than any path we've
# already found.
if distance < distances[neighbor]:
distances[neighbor] = distance
heapq.heappush(pq, (distance, neighbor))
return distances
example_graph = {
'U': {'V': 6, 'W': 7},
'V': {'U': 6, 'X': 10},
'W': {'U': 7, 'X': 1},
'X': {'W': 1, 'V': 10}
}
print(calculate_distances(example_graph, 'U'))
My understanding is that once a node is processed, it no longer needs to be considered.
如果你的意思是“考虑”是计算了它沿路径的距离,那么这是真的,但也要考虑到将距离与迄今为止的最佳值进行比较并不比检查邻居是否被计算复杂得多已经访问过。在任何一种情况下(算法),真正 visited 节点(即已经从堆中 popped 的节点)将永远不会被推入堆再次.
让我们看一下该算法的一个变体,其中(仅)使用“已访问”的概念来确定是否应将邻居放入堆中。我有意尝试限制代码更改,以便更好地突出差异:
INF = float('infinity')
def calculate_distances_2(graph, starting_vertex):
distances = {vertex: INF for vertex in graph}
pq = [(0, starting_vertex)]
while len(pq) > 0:
current_distance, current_vertex = heapq.heappop(pq)
if distances[current_vertex] != INF: # Already visited?
continue
distances[current_vertex] = current_distance
for neighbor, weight in graph[current_vertex].items():
print(f"Neighbor: {neighbor}, weight: {weight}, goes on heap? {distances[neighbor] == INF}")
if distances[neighbor] == INF: # Not yet visited?
heapq.heappush(pq, (current_distance + weight, neighbor))
return distances
那么这里有什么不同呢?
节点的距离仅在节点从堆中弹出时设置,这也用于将节点标记为已访问:它不再具有 Infinity 作为关联距离。这意味着:
- 我们不在循环开始前设置
distances[starting_vertex] = 0
。 - 我们只检查邻居是否被访问过(隐含地,通过检查
distances[starting_vertex]
是否为Infinity),但不比较当前邻居的距离是 改进 。现在这完全留给堆机制
- 我们不在循环开始前设置
当节点已经访问时,不必计算邻居沿当前路径的距离。
第一点实际上意味着第二种算法可能会(再次)将节点推入堆中,而第一种算法可能不会。在最坏的情况下没有差异,但在随机情况下我们可以预期会发生这种差异。这是因为第一个算法使用了更多的信息:当同一个节点已经在堆上出现一次或多次时,第一个算法知道到该节点的遍历路径中的 最短 距离,而第二个算法“只”知道这个节点还没有被访问过(即还没有被弹出)。
具体例子
对于您的示例,没有区别。我试过这张图:
...并使用下面的代码进行比较。请注意,我更改了您的 print
调用:我删除了 distance
的输出(因为在第二种算法中尚未计算),并添加了更多信息:邻居是否将被推入堆中与否 (False/True):
import heapq
INF = float('infinity')
def calculate_distances(graph, starting_vertex):
distances = {vertex: INF for vertex in graph}
distances[starting_vertex] = 0
pq = [(0, starting_vertex)]
while len(pq) > 0:
current_distance, current_vertex = heapq.heappop(pq)
if current_distance > distances[current_vertex]:
continue
for neighbor, weight in graph[current_vertex].items():
distance = current_distance + weight
print(f"Neighbor: {neighbor}, weight: {weight}, goes on heap?: {distance < distances[neighbor]}")
if distance < distances[neighbor]:
distances[neighbor] = distance
heapq.heappush(pq, (distance, neighbor))
return distances
### Alternative
def calculate_distances_2(graph, starting_vertex):
distances = {vertex: INF for vertex in graph}
pq = [(0, starting_vertex)]
while len(pq) > 0:
current_distance, current_vertex = heapq.heappop(pq)
if distances[current_vertex] != INF:
continue
distances[current_vertex] = current_distance
for neighbor, weight in graph[current_vertex].items():
print(f"Neighbor: {neighbor}, weight: {weight}, goes on heap? {distances[neighbor] == INF}")
if distances[neighbor] == INF:
heapq.heappush(pq, (current_distance + weight, neighbor))
return distances
example_graph = {
"0": { "1": 2, "2": 6 },
"1": { "0": 2, "3": 5 },
"2": { "0": 6, "3": 8 },
"3": { "1": 5, "2": 8, "4": 10, "5": 15 },
"4": { "3": 10, "5": 6, "6": 2 },
"5": { "3": 15, "4": 6, "6": 6 },
"6": { "4": 2, "5": 6 }
}
print(calculate_distances(example_graph, '0'))
print(calculate_distances_2(example_graph, '0'))
我这里只提供第一种算法生成的输出,并标记第二种算法有不同输出的行:
Neighbor: 1, weight: 2, goes on heap?: True
Neighbor: 2, weight: 6, goes on heap?: True
Neighbor: 0, weight: 2, goes on heap?: False
Neighbor: 3, weight: 5, goes on heap?: True
Neighbor: 0, weight: 6, goes on heap?: False
Neighbor: 3, weight: 8, goes on heap?: False ****
Neighbor: 1, weight: 5, goes on heap?: False
Neighbor: 2, weight: 8, goes on heap?: False
Neighbor: 4, weight: 10, goes on heap?: True
Neighbor: 5, weight: 15, goes on heap?: True
Neighbor: 3, weight: 10, goes on heap?: False
Neighbor: 5, weight: 6, goes on heap?: False ****
Neighbor: 6, weight: 2, goes on heap?: True
Neighbor: 4, weight: 2, goes on heap?: False
Neighbor: 5, weight: 6, goes on heap?: False ****
Neighbor: 3, weight: 15, goes on heap?: False
Neighbor: 4, weight: 6, goes on heap?: False
Neighbor: 6, weight: 6, goes on heap?: False
{'0': 0, '1': 2, '2': 6, '3': 7, '4': 17, '5': 22, '6': 19}
输出不同的地方(3处)表示第一个算法输出False
和第二个True
.
结论
Attribute | First algorithm | Second algorithm |
---|---|---|
Heap size | Better | Worse |
Additions | Worse | Better |
堆大小在随机情况下更能决定执行时间,因此第一个算法预计 运行 稍快。