未加权的有向图距离
Unweighted directed graph distances
假设我有一个未加权的有向图。我想知道是否有一种方法可以存储起始节点和图中所有剩余节点之间的所有距离。我知道 Dijkstra 的算法可能是一种选择,但我不确定这是否是最好的算法,因为我正在处理一个相当大的图(~100k 节点),而且它是一个未加权的图。到目前为止,我的挑战是执行 BFS,同时尝试存储所有距离。这是可行的方法吗?
最后,由于我对图论还很陌生,有人可以为我指明正确的方向,以便 Python 很好地解决此类问题吗?
绝对可行,而且如果您的数据结构包含一个以起始节点标识符为索引的每个起始节点的结束节点列表:
这是一个使用字典作为边的示例:{startNode:端节点列表}
from collections import deque
maxDistance = 0
def getDistances(origin,edges):
global maxDistance
maxDistance = 0
distances = {origin:0} # {endNode:distance from origin}
toLink = deque([origin]) # start at origin (distance=0)
while toLink:
start = toLink.popleft() # previous end, will chain to next
dist = distances[start] + 1 # new next are at +1
for end in edges[start]: # next end nodes
if end in distances: continue # new ones only
distances[end] = dist # record distance
toLink.append(end) # will link from there
maxDistance = max(maxDistance,dist)
return distances
这对每个节点进行一次迭代(不包括无法访问的节点)并使用快速字典访问来跟踪到新的下一个节点的链接
使用一些随机测试数据(1000 万条边)...
import random
from collections import defaultdict
print("loading simulated graphs")
vertexCount = 100000
edgeCount = vertexCount * 100
edges = defaultdict(set)
edgesLoaded = 0
minSpan = 1 # vertexCount//2
while edgesLoaded<edgeCount:
start = random.randrange(vertexCount)
end = random.randrange(vertexCount)
if abs(start-end) > minSpan and end not in edges[start]:
edges[start].add(end)
edgesLoaded += 1
print("loaded!")
性能:
# starting from a randomly selected node
origin = random.choice(list(edges.keys()))
from timeit import timeit
t = timeit(lambda:getDistances(origin,edges),number=1)
print(f"{t:.2f} seconds for",edgeCount,"edges", "max distance = ",maxDistance)
# 3.06 seconds for 10000000 edges max distance = 4
假设我有一个未加权的有向图。我想知道是否有一种方法可以存储起始节点和图中所有剩余节点之间的所有距离。我知道 Dijkstra 的算法可能是一种选择,但我不确定这是否是最好的算法,因为我正在处理一个相当大的图(~100k 节点),而且它是一个未加权的图。到目前为止,我的挑战是执行 BFS,同时尝试存储所有距离。这是可行的方法吗?
最后,由于我对图论还很陌生,有人可以为我指明正确的方向,以便 Python 很好地解决此类问题吗?
绝对可行,而且如果您的数据结构包含一个以起始节点标识符为索引的每个起始节点的结束节点列表:
这是一个使用字典作为边的示例:{startNode:端节点列表}
from collections import deque
maxDistance = 0
def getDistances(origin,edges):
global maxDistance
maxDistance = 0
distances = {origin:0} # {endNode:distance from origin}
toLink = deque([origin]) # start at origin (distance=0)
while toLink:
start = toLink.popleft() # previous end, will chain to next
dist = distances[start] + 1 # new next are at +1
for end in edges[start]: # next end nodes
if end in distances: continue # new ones only
distances[end] = dist # record distance
toLink.append(end) # will link from there
maxDistance = max(maxDistance,dist)
return distances
这对每个节点进行一次迭代(不包括无法访问的节点)并使用快速字典访问来跟踪到新的下一个节点的链接
使用一些随机测试数据(1000 万条边)...
import random
from collections import defaultdict
print("loading simulated graphs")
vertexCount = 100000
edgeCount = vertexCount * 100
edges = defaultdict(set)
edgesLoaded = 0
minSpan = 1 # vertexCount//2
while edgesLoaded<edgeCount:
start = random.randrange(vertexCount)
end = random.randrange(vertexCount)
if abs(start-end) > minSpan and end not in edges[start]:
edges[start].add(end)
edgesLoaded += 1
print("loaded!")
性能:
# starting from a randomly selected node
origin = random.choice(list(edges.keys()))
from timeit import timeit
t = timeit(lambda:getDistances(origin,edges),number=1)
print(f"{t:.2f} seconds for",edgeCount,"edges", "max distance = ",maxDistance)
# 3.06 seconds for 10000000 edges max distance = 4