将节点添加到断开连接的图中以完全连接图组件,并具有节点间距离约束
Adding nodes to a disconnected graph in order to fully connect the graph components, with inter-node distance constraints
我有一个图表,其中每个节点都有一个由 (x,y) 给出的空间位置,并且节点之间的边仅在每个节点之间的欧氏距离为 sqrt(2) 时才连接 或更少。这是我的例子:
import networkx
G=nx.Graph()
G.add_node(1,pos=(1,1))
G.add_node(2,pos=(2,2))
G.add_node(3,pos=(1,2))
G.add_node(4,pos=(1,4))
G.add_node(5,pos=(2,5))
G.add_node(6,pos=(4,2))
G.add_node(7,pos=(5,2))
G.add_node(8,pos=(5,3))
# Connect component one
G.add_edge(1,2)
G.add_edge(1,3)
G.add_edge(2,3)
# Connect component two
G.add_edge(6,7)
# Connect component three
G.add_edge(6,8)
G.add_edge(7,8)
G.add_edge(4,5)
pos=nx.get_node_attributes(G,'pos')
nx.draw(G,pos)
我的问题是,如何确定附加节点的最佳位置和数量,以便连接图形组件,同时确保任何附加节点始终在 sqrt(2)[=20 内=] 现有节点?
我尝试将遗传算法应用于上述问题。我最初猜测两个额外的节点将连接所有三个断开连接的组件。
import networkx as nx
import pygad
import math
from libpysal import weights
import numpy as np
no_comps_target = 1 # a connected graph has 1 component
max_dist_between_nodes = math.sqrt(2) # in km
num_pts = 2 # number of additional nodes to add
num_genes = num_pts*2 # number of genes required with the GA. A gene each for x and y coordinates.
# Generate coordinate np array of existing components
centroids = np.array([(v) for k, v in pos.items()])
# Create indcies for new nodes within the GA solution list
y_ix = [x for x in range(num_genes) if x % 2 != 0]
x_ix = [x for x in range(num_genes) if x % 2 == 0]
# Define fitness function
def my_fitness_func(solution, solution_idx):
# Select coordinates of GA solution
xs = np.array(solution[x_ix])
ys = np.array(solution[y_ix])
new_pts = np.column_stack((xs,ys))
# Create one set for all coordinates
all_pts = np.append(centroids, new_pts,axis=0)
# Calculate weights using a distance band equal to the max distance between nodes
w = weights.DistanceBand.from_array(all_pts,
threshold=max_dist_between_nodes,
silence_warnings=True)
# Convert to a networkx obejct
G = w.to_networkx()
# Calculate the number of graph components for G
# Target is 1 - fully connected graph
no_comps_solution = nx.number_connected_components(G)
# Calculate solution fitness
fitness = 1.0 / np.abs(no_comps_target - no_comps_solution + 0.000001)
return fitness
# Set constraints on possible solution locations
x_max = 5
x_min = 0
y_max = 5
y_min = 1
ga_instance = pygad.GA(
num_generations=20,
num_parents_mating=2,
fitness_func=my_fitness_func,
sol_per_pop=10,
num_genes=num_genes,
gene_type= int,
gene_space= [{'low': x_min, 'high': x_max, 'step': 1},
{'low': y_min, 'high': y_max, 'step': 1}] * num_pts,
mutation_num_genes=1,
parent_selection_type = "sss",
keep_parents =2,
stop_criteria="saturate_10" # Stop if no progress after 10 generations
)
ga_instance.run()
# If final reached a maximum we should expect a fitness of 100,000.
solution, solution_fitness, solution_idx = ga_instance.best_solution()
print("Parameters of the best solution : {solution}".format(solution=solution))
print("Fitness value of the best solution = {solution_fitness}".format(solution_fitness=solution_fitness))
这给出了一个有效的解决方案:
Parameters of the best solution : [3 3 2 4]
Fitness value of the best solution = 1000000.0
从 运行 多次,我得到了多个有效的解决方案。我认为这是有道理的。另外,如何确定附加节点的最佳数量?特别是如果这个问题要大得多。我仍然想知道是否有其他方法可以解决这个问题。特别是如果它们的代码更少!
由于一个顶点的最大长度可以是sqrt(2),我们可以
假设我们的图表需要存在于网格世界中,其中
节点具有正整数 x 和 y 坐标,并且顶点
只能连接附近的节点,受长度限制。
解决方案是一次扩展一个组件
使组件靠得更近的节点和顶点。
该问题的连通图将添加节点 9 和 10,
与连接顶点。请参见下面的第一张图片。
除了OP问题中提到的测试,
似乎更有趣的测试是节点 4 位于位置 (3, 4)。这种情况下的最优解在一个 Steiner 点(参见 enter link description here)。
要为小图构建这样的解决方案,我们可以使用
添加一个节点和连接一个节点的边的算法
彼此最近的一对节点。以适应
构建一个斯坦纳树,我们允许选择的位置
点在对角线位置。也就是说,节点 2 和 6 具有
相同的 y 坐标,但我们将沿对角线创建节点 9,以便
我们可以建造一座通往下一个最近邻居的桥梁。见图片
下面。
为了防止构建额外的节点,应该在
转弯。由于我们已经对节点进行了编号,因此可以通过
选择要增长的最小节点号。这导致
在需要更多 space 时种植更大的 Steiner 树
被越过。见下图,其中节点 9 到 12
添加到原始图表的位置。
这个(而且非常低效)的粗略算法是
如下。欧氏距离函数修改自
Russell & Norvig(Github 人工智能库,
Stuart Russell 和 Peter Norvig 的现代方法)。
import itertools
import networkx as nx
import numpy as np
# Assumes graph G created same or similar to code in OP's question
def closest_nodes(c1, c2, components):
nodes_c1 = [(n, G.nodes[n]['pos']) for n in components[c1]]
nodes_c2 = [(n,G.nodes[n]['pos']) for n in components[c2]]
xnodes = itertools.product(nodes_c1, nodes_c2)
closest = min(xnodes, key=lambda x: euclidean_distance(x[0], x[1]))
return euclidean_distance(closest[0], closest[1]), closest
def next_closest_node(n1, c1, c2, components):
other_nodes = []
for i,v in enumerate(components):
if i == c1 or i == c2:
continue
other_nodes.extend((n, G.nodes[n]['pos']) for n in v)
if len(other_nodes) == 0:
return None
# print('next closest', n1, other_nodes)
closest = min(other_nodes, key=lambda x: euclidean_distance(x, n1))
return closest
def closest_comps(components):
xcomps = itertools.combinations(range(len(components)), 2)
closest = min(xcomps,
key=lambda x: closest_nodes(x[0], x[1], components)[0])
return closest
def euclidean_distance(x, y):
xc = x[1]
yc = y[1]
# print('e_d', x, y)
return np.sqrt(sum((_x - _y) ** 2 for _x, _y in zip(xc, yc)))
def determine_pos(sel_comps, sel_nodes, components):
next_closest = next_closest_node(sel_nodes[1][0], *sel_comps, components)
# Grow from the smallest node number
if sel_nodes[1][0][0] < sel_nodes[1][1][0]:
from_node = sel_nodes[1][0][0]
start_pos = sel_nodes[1][0][1]
target_pos = sel_nodes[1][1][1]
else:
from_node = sel_nodes[1][1][0]
start_pos = sel_nodes[1][1][1]
target_pos = sel_nodes[1][0][1]
diff_x = target_pos[0]-start_pos[0]
diff_y = target_pos[1]-start_pos[1]
new_x, new_y = start_pos[0], start_pos[1]
if diff_x != 0:
new_x += diff_x//abs(diff_x)
elif next_closest is not None:
diff_x = next_closest[1][0]-start_pos[0]
if diff_x != 0:
new_x += diff_x//abs(diff_x)
if diff_y != 0:
new_y += diff_y//abs(diff_y)
elif next_closest is not None:
diff_y = next_closest[1][1]-start_pos[1]
if diff_y != 0:
new_y += diff_y//abs(diff_y)
return from_node, (new_x, new_y)
while not nx.is_connected(G):
components = [c for c in
sorted(nx.connected_components(G), key=len, reverse=True)]
sel_comps = closest_comps(components)
sel_nodes = closest_nodes(*sel_comps, components)
sel_dist = euclidean_distance(sel_nodes[1][0], sel_nodes[1][1])
edge_needed = sel_dist <= np.sqrt(2)
if edge_needed:
G.add_edge(sel_nodes[1][0][0], sel_nodes[1][1][0])
else:
new_pos = determine_pos(sel_comps, sel_nodes, components)
new_node = G.number_of_nodes() + 1
G.add_node(new_node, pos=new_pos[1])
G.add_edge(new_pos[0], new_node)
pos = nx.get_node_attributes(G, 'pos')
nx.draw(G, pos, with_labels=True)
我确信这个问题是 NP-hard 问题。我知道的最接近的问题是具有八线度量的几何斯坦纳树问题。
我有两个相当快捷的建议。两者都是启发式的。
第一个想法:将问题表述为欧几里得斯坦纳树问题 (https://en.wikipedia.org/wiki/Steiner_tree_problem#Euclidean_Steiner_tree),您首先只考虑问题的节点,而忘记边。使用 GeoSteiner 解决问题:
http://www.geosteiner.com/
这应该可以快速为您提供 10000 个或更多节点问题的解决方案(如果您需要解决更大的问题,您可以在完整的 Steiner 树生成后使用 GeoSteiner 写出问题并使用 https://scipjack.zib.de/). There is no Python interface, but just write your problem to a plain text file, the syntax is quite easy. Afterward, put additional nodes into the solution provided by GeoSteiner such that the \sqrt(2) condition is satisfied. Finally, you need to do some clean-up to get rid of redundant edges, because the solution will not take into account that you already have edges in your original problem. Take all the edges and nodes that you have computed so far and define a weighted graph by giving all of your original edges weight 0 and all of the newly added edges weight 1. Consider a Steiner tree problem in graphs (https://en.wikipedia.org/wiki/Steiner_tree_problem#Steiner_tree_in_graphs_and_variants) on this weighted graph, where the terminal set corresponds to your original nodes. Solve this Steiner tree problem with SCIP-Jack: https://scipjack.zib.de/.
思路二:直接将你的问题看成图上的Steiner树问题如下:每条原始边的权重为0,将所有原始节点视为终端。在最多 \sqrt(2) 距离处添加额外的节点和边。例如,您可以在所有连接的组件周围放置一个大矩形,并从每个节点以 0、45、90 度的角度递归地添加额外的 8 个节点,...距离为 sqrt(2) 且边缘为权重1 在图的斯坦纳树问题中,只要它们在矩形内。如果这些节点之一在原始节点之一的距离 sqrt(2) 以内,则将它们直接与权重为 1 的边连接。使用 SCIP-Jack 解决图中相应的 Steiner 树问题。
我有一个图表,其中每个节点都有一个由 (x,y) 给出的空间位置,并且节点之间的边仅在每个节点之间的欧氏距离为 sqrt(2) 时才连接 或更少。这是我的例子:
import networkx
G=nx.Graph()
G.add_node(1,pos=(1,1))
G.add_node(2,pos=(2,2))
G.add_node(3,pos=(1,2))
G.add_node(4,pos=(1,4))
G.add_node(5,pos=(2,5))
G.add_node(6,pos=(4,2))
G.add_node(7,pos=(5,2))
G.add_node(8,pos=(5,3))
# Connect component one
G.add_edge(1,2)
G.add_edge(1,3)
G.add_edge(2,3)
# Connect component two
G.add_edge(6,7)
# Connect component three
G.add_edge(6,8)
G.add_edge(7,8)
G.add_edge(4,5)
pos=nx.get_node_attributes(G,'pos')
nx.draw(G,pos)
我的问题是,如何确定附加节点的最佳位置和数量,以便连接图形组件,同时确保任何附加节点始终在 sqrt(2)[=20 内=] 现有节点?
我尝试将遗传算法应用于上述问题。我最初猜测两个额外的节点将连接所有三个断开连接的组件。
import networkx as nx
import pygad
import math
from libpysal import weights
import numpy as np
no_comps_target = 1 # a connected graph has 1 component
max_dist_between_nodes = math.sqrt(2) # in km
num_pts = 2 # number of additional nodes to add
num_genes = num_pts*2 # number of genes required with the GA. A gene each for x and y coordinates.
# Generate coordinate np array of existing components
centroids = np.array([(v) for k, v in pos.items()])
# Create indcies for new nodes within the GA solution list
y_ix = [x for x in range(num_genes) if x % 2 != 0]
x_ix = [x for x in range(num_genes) if x % 2 == 0]
# Define fitness function
def my_fitness_func(solution, solution_idx):
# Select coordinates of GA solution
xs = np.array(solution[x_ix])
ys = np.array(solution[y_ix])
new_pts = np.column_stack((xs,ys))
# Create one set for all coordinates
all_pts = np.append(centroids, new_pts,axis=0)
# Calculate weights using a distance band equal to the max distance between nodes
w = weights.DistanceBand.from_array(all_pts,
threshold=max_dist_between_nodes,
silence_warnings=True)
# Convert to a networkx obejct
G = w.to_networkx()
# Calculate the number of graph components for G
# Target is 1 - fully connected graph
no_comps_solution = nx.number_connected_components(G)
# Calculate solution fitness
fitness = 1.0 / np.abs(no_comps_target - no_comps_solution + 0.000001)
return fitness
# Set constraints on possible solution locations
x_max = 5
x_min = 0
y_max = 5
y_min = 1
ga_instance = pygad.GA(
num_generations=20,
num_parents_mating=2,
fitness_func=my_fitness_func,
sol_per_pop=10,
num_genes=num_genes,
gene_type= int,
gene_space= [{'low': x_min, 'high': x_max, 'step': 1},
{'low': y_min, 'high': y_max, 'step': 1}] * num_pts,
mutation_num_genes=1,
parent_selection_type = "sss",
keep_parents =2,
stop_criteria="saturate_10" # Stop if no progress after 10 generations
)
ga_instance.run()
# If final reached a maximum we should expect a fitness of 100,000.
solution, solution_fitness, solution_idx = ga_instance.best_solution()
print("Parameters of the best solution : {solution}".format(solution=solution))
print("Fitness value of the best solution = {solution_fitness}".format(solution_fitness=solution_fitness))
这给出了一个有效的解决方案:
Parameters of the best solution : [3 3 2 4]
Fitness value of the best solution = 1000000.0
从 运行 多次,我得到了多个有效的解决方案。我认为这是有道理的。另外,如何确定附加节点的最佳数量?特别是如果这个问题要大得多。我仍然想知道是否有其他方法可以解决这个问题。特别是如果它们的代码更少!
由于一个顶点的最大长度可以是sqrt(2),我们可以 假设我们的图表需要存在于网格世界中,其中 节点具有正整数 x 和 y 坐标,并且顶点 只能连接附近的节点,受长度限制。
解决方案是一次扩展一个组件 使组件靠得更近的节点和顶点。 该问题的连通图将添加节点 9 和 10, 与连接顶点。请参见下面的第一张图片。
除了OP问题中提到的测试, 似乎更有趣的测试是节点 4 位于位置 (3, 4)。这种情况下的最优解在一个 Steiner 点(参见 enter link description here)。
要为小图构建这样的解决方案,我们可以使用 添加一个节点和连接一个节点的边的算法 彼此最近的一对节点。以适应 构建一个斯坦纳树,我们允许选择的位置 点在对角线位置。也就是说,节点 2 和 6 具有 相同的 y 坐标,但我们将沿对角线创建节点 9,以便 我们可以建造一座通往下一个最近邻居的桥梁。见图片 下面。
为了防止构建额外的节点,应该在 转弯。由于我们已经对节点进行了编号,因此可以通过 选择要增长的最小节点号。这导致 在需要更多 space 时种植更大的 Steiner 树 被越过。见下图,其中节点 9 到 12 添加到原始图表的位置。
这个(而且非常低效)的粗略算法是 如下。欧氏距离函数修改自 Russell & Norvig(Github 人工智能库, Stuart Russell 和 Peter Norvig 的现代方法)。
import itertools
import networkx as nx
import numpy as np
# Assumes graph G created same or similar to code in OP's question
def closest_nodes(c1, c2, components):
nodes_c1 = [(n, G.nodes[n]['pos']) for n in components[c1]]
nodes_c2 = [(n,G.nodes[n]['pos']) for n in components[c2]]
xnodes = itertools.product(nodes_c1, nodes_c2)
closest = min(xnodes, key=lambda x: euclidean_distance(x[0], x[1]))
return euclidean_distance(closest[0], closest[1]), closest
def next_closest_node(n1, c1, c2, components):
other_nodes = []
for i,v in enumerate(components):
if i == c1 or i == c2:
continue
other_nodes.extend((n, G.nodes[n]['pos']) for n in v)
if len(other_nodes) == 0:
return None
# print('next closest', n1, other_nodes)
closest = min(other_nodes, key=lambda x: euclidean_distance(x, n1))
return closest
def closest_comps(components):
xcomps = itertools.combinations(range(len(components)), 2)
closest = min(xcomps,
key=lambda x: closest_nodes(x[0], x[1], components)[0])
return closest
def euclidean_distance(x, y):
xc = x[1]
yc = y[1]
# print('e_d', x, y)
return np.sqrt(sum((_x - _y) ** 2 for _x, _y in zip(xc, yc)))
def determine_pos(sel_comps, sel_nodes, components):
next_closest = next_closest_node(sel_nodes[1][0], *sel_comps, components)
# Grow from the smallest node number
if sel_nodes[1][0][0] < sel_nodes[1][1][0]:
from_node = sel_nodes[1][0][0]
start_pos = sel_nodes[1][0][1]
target_pos = sel_nodes[1][1][1]
else:
from_node = sel_nodes[1][1][0]
start_pos = sel_nodes[1][1][1]
target_pos = sel_nodes[1][0][1]
diff_x = target_pos[0]-start_pos[0]
diff_y = target_pos[1]-start_pos[1]
new_x, new_y = start_pos[0], start_pos[1]
if diff_x != 0:
new_x += diff_x//abs(diff_x)
elif next_closest is not None:
diff_x = next_closest[1][0]-start_pos[0]
if diff_x != 0:
new_x += diff_x//abs(diff_x)
if diff_y != 0:
new_y += diff_y//abs(diff_y)
elif next_closest is not None:
diff_y = next_closest[1][1]-start_pos[1]
if diff_y != 0:
new_y += diff_y//abs(diff_y)
return from_node, (new_x, new_y)
while not nx.is_connected(G):
components = [c for c in
sorted(nx.connected_components(G), key=len, reverse=True)]
sel_comps = closest_comps(components)
sel_nodes = closest_nodes(*sel_comps, components)
sel_dist = euclidean_distance(sel_nodes[1][0], sel_nodes[1][1])
edge_needed = sel_dist <= np.sqrt(2)
if edge_needed:
G.add_edge(sel_nodes[1][0][0], sel_nodes[1][1][0])
else:
new_pos = determine_pos(sel_comps, sel_nodes, components)
new_node = G.number_of_nodes() + 1
G.add_node(new_node, pos=new_pos[1])
G.add_edge(new_pos[0], new_node)
pos = nx.get_node_attributes(G, 'pos')
nx.draw(G, pos, with_labels=True)
我确信这个问题是 NP-hard 问题。我知道的最接近的问题是具有八线度量的几何斯坦纳树问题。 我有两个相当快捷的建议。两者都是启发式的。
第一个想法:将问题表述为欧几里得斯坦纳树问题 (https://en.wikipedia.org/wiki/Steiner_tree_problem#Euclidean_Steiner_tree),您首先只考虑问题的节点,而忘记边。使用 GeoSteiner 解决问题: http://www.geosteiner.com/ 这应该可以快速为您提供 10000 个或更多节点问题的解决方案(如果您需要解决更大的问题,您可以在完整的 Steiner 树生成后使用 GeoSteiner 写出问题并使用 https://scipjack.zib.de/). There is no Python interface, but just write your problem to a plain text file, the syntax is quite easy. Afterward, put additional nodes into the solution provided by GeoSteiner such that the \sqrt(2) condition is satisfied. Finally, you need to do some clean-up to get rid of redundant edges, because the solution will not take into account that you already have edges in your original problem. Take all the edges and nodes that you have computed so far and define a weighted graph by giving all of your original edges weight 0 and all of the newly added edges weight 1. Consider a Steiner tree problem in graphs (https://en.wikipedia.org/wiki/Steiner_tree_problem#Steiner_tree_in_graphs_and_variants) on this weighted graph, where the terminal set corresponds to your original nodes. Solve this Steiner tree problem with SCIP-Jack: https://scipjack.zib.de/.
思路二:直接将你的问题看成图上的Steiner树问题如下:每条原始边的权重为0,将所有原始节点视为终端。在最多 \sqrt(2) 距离处添加额外的节点和边。例如,您可以在所有连接的组件周围放置一个大矩形,并从每个节点以 0、45、90 度的角度递归地添加额外的 8 个节点,...距离为 sqrt(2) 且边缘为权重1 在图的斯坦纳树问题中,只要它们在矩形内。如果这些节点之一在原始节点之一的距离 sqrt(2) 以内,则将它们直接与权重为 1 的边连接。使用 SCIP-Jack 解决图中相应的 Steiner 树问题。