沿一个轴组合附近的边界框
Combining nearby bounding boxes along one axis
在这里,我使用 Google Vision API 从下图中检测文本。红色框表示我想要获取的组合边界框的样本。
基本上,我从上图中得到了文本输出和边界框。在这里,我想合并位于同一行(从左到右)的边界框和文本。例如,第一行将合并在一起:
[{'description': 'บริษัทไปรษณีย์ไทย',
'vertices': [(528, 202), (741, 202), (741, 222), (528, 222)]},
{'description': 'จํากัด',
'vertices': [(754, 204), (809, 204), (809, 222), (754, 222)]},
...
到
[{'description': 'บริษัทไปรษณีย์ไทยจำกัด',
'vertices': [(528, 202), (809, 202), (809, 222), (528, 222)]},
...
以下这些行
{'description': 'RP',
'vertices': [(729, 1072), (758, 1072), (758, 1091), (729, 1091)]},
{'description': '8147',
'vertices': [(768, 1072), (822, 1072), (822, 1092), (768, 1092)]},
{'description': '3609',
'vertices': [(834, 1073), (889, 1073), (889, 1093), (834, 1093)]},
{'description': '7',
'vertices': [(900, 1073), (911, 1073), (911, 1092), (900, 1092)]},
{'description': 'TH',
将合并在一起。
当前方法
我调查了
-
- Non-max suppression algorithm
但不能根据我的需要生成一个特定的,因为它依赖于重叠像素的百分比。如果有人能提供帮助,那就太好了!
请尝试在此处使用边界框数据:https://gist.github.com/titipata/fd44572f7f6c3cc1dfbac05fb86f6081
输入:
out = [{'description': 'บริษัทไปรษณีย์ไทย',
'vertices': [(528, 202), (741, 202), (741, 222), (528, 222)]},
{'description': 'จํากัด',
'vertices': [(754, 204), (809, 204), (809, 222), (754, 222)]},
{'description': 'RP',
'vertices': [(729, 1072), (758, 1072), (758, 1091), (729, 1091)]},
{'description': '8147',
'vertices': [(768, 1072), (822, 1072), (822, 1092), (768, 1092)]},
{'description': '3609',
'vertices': [(834, 1073), (889, 1073), (889, 1093), (834, 1093)]},
{'description': '7',
'vertices': [(900, 1073), (911, 1073), (911, 1092), (900, 1092)]}
]
我假设,4个元组分别代表左上、右上、右下、左下坐标的x、y坐标(按顺序)
首先,我们需要找到所有在x方向上接近,并且在y方向上几乎相同(位置相同高度)的bbox对。 N.B:如果有遗漏,您可能需要调整两个阈值。
import numpy as np
pairs = []
threshold_y = 4 # height threshold
threshold_x = 20 # x threshold
for i in range(len(out)):
for j in range(i+1, len(out)):
left_upi, right_upi, right_lowi, left_lowi = out[i]['vertices']
left_upj, right_upj, right_lowj, left_lowj = out[j]['vertices']
# first of all, they should be in the same height range, starting Y axis should be almost same
# their starting x axis is close upto a threshold
cond1 = (abs(left_upi[1] - left_upj[1]) < threshold_y)
cond2 = (abs(right_upi[0] - left_upj[0]) < threshold_x)
cond3 = (abs(right_upj[0] - left_upi[0]) < threshold_x)
if cond1 and (cond2 or cond3):
pairs.append([i,j])
输出:
pairs
[[0, 1], [2, 3], [3, 4], [4, 5]]
- 现在,我们只有对,但我们还需要找到所有连通分量,例如,我们知道 0、1 在一个分量中,而 2、3、4、5 在另一个分量中。 (通常,图算法最适合这个任务,但为了简单起见,我做了一个迭代搜索)
merged_pairs = []
for i in range(len(pairs)):
cur_set = set()
p = pairs[i]
done = False
for k in range(len(merged_pairs)):
if p[0] in merged_pairs[k]:
merged_pairs[k].append(p[1])
done = True
if p[1] in merged_pairs[k]:
merged_pairs[k].append(p[0])
done = True
if done:
continue
cur_set.add(p[0])
cur_set.add(p[1])
match_idx = []
while True:
num_match = 0
for j in range(i+1, len(pairs)):
p2 = pairs[j]
if j not in match_idx and (p2[0] in cur_set or p2[1] in cur_set):
cur_set.add(p2[0])
cur_set.add(p2[1])
num_match += 1
match_idx.append(j)
if num_match == 0:
break
merged_pairs.append(list(cur_set))
merged_pairs = [list(set(a)) for a in merged_pairs]
输出:
merged_pairs
[[0, 1], [2, 3, 4, 5]]
替代 networkx 解决方案:
(如果您不介意使用额外的 networkx
导入,则更短)
import networkx as nx
g = nx.Graph()
g.add_edges_from([[0, 1], [2, 3], [3, 4], [4, 5]]) # pass pairs here
gs = [list(a) for a in list(nx.connected_components(g))] # get merged pairs here
print(gs)
[[0, 1], [2, 3, 4, 5]]
- 现在,我们有了所有的连接组件,我们可以根据它们的起始 x 坐标对它们进行排序,并合并边界框。
# for connected components, sort them according to x-axis and merge
out_final = []
INF = 999999999 # a large number greater than any co-ordinate
for idxs in merged_pairs:
c_bbox = []
for i in idxs:
c_bbox.append(out[i])
sorted_x = sorted(c_bbox, key = lambda x: x['vertices'][0][0])
new_sol = {}
new_sol['description'] = ''
new_sol['vertices'] = [[INF, INF], [-INF, INF], [-INF, -INF], [INF, -INF]]
for k in sorted_x:
new_sol['description'] += k['description']
new_sol['vertices'][0][0] = min(new_sol['vertices'][0][0], k['vertices'][0][0])
new_sol['vertices'][0][1] = min(new_sol['vertices'][0][1], k['vertices'][0][1])
new_sol['vertices'][1][0] = max(new_sol['vertices'][1][0], k['vertices'][1][0])
new_sol['vertices'][1][1] = min(new_sol['vertices'][1][1], k['vertices'][1][1])
new_sol['vertices'][2][0] = max(new_sol['vertices'][2][0], k['vertices'][2][0])
new_sol['vertices'][2][1] = max(new_sol['vertices'][2][1], k['vertices'][2][1])
new_sol['vertices'][3][0] = min(new_sol['vertices'][3][0], k['vertices'][3][0])
new_sol['vertices'][3][1] = max(new_sol['vertices'][3][1], k['vertices'][3][1])
out_final.append(new_sol)
final_out:
out_final
[{'description': 'บริษัทไปรษณีย์ไทยจํากัด',
'vertices': [[528, 202], [809, 202], [809, 222], [528, 222]]},
{'description': 'RP814736097',
'vertices': [[729, 1072], [911, 1072], [911, 1093], [729, 1093]]}]
在这里,我使用 Google Vision API 从下图中检测文本。红色框表示我想要获取的组合边界框的样本。
基本上,我从上图中得到了文本输出和边界框。在这里,我想合并位于同一行(从左到右)的边界框和文本。例如,第一行将合并在一起:
[{'description': 'บริษัทไปรษณีย์ไทย',
'vertices': [(528, 202), (741, 202), (741, 222), (528, 222)]},
{'description': 'จํากัด',
'vertices': [(754, 204), (809, 204), (809, 222), (754, 222)]},
...
到
[{'description': 'บริษัทไปรษณีย์ไทยจำกัด',
'vertices': [(528, 202), (809, 202), (809, 222), (528, 222)]},
...
以下这些行
{'description': 'RP',
'vertices': [(729, 1072), (758, 1072), (758, 1091), (729, 1091)]},
{'description': '8147',
'vertices': [(768, 1072), (822, 1072), (822, 1092), (768, 1092)]},
{'description': '3609',
'vertices': [(834, 1073), (889, 1073), (889, 1093), (834, 1093)]},
{'description': '7',
'vertices': [(900, 1073), (911, 1073), (911, 1092), (900, 1092)]},
{'description': 'TH',
将合并在一起。
当前方法
我调查了
-
但不能根据我的需要生成一个特定的,因为它依赖于重叠像素的百分比。如果有人能提供帮助,那就太好了!
请尝试在此处使用边界框数据:https://gist.github.com/titipata/fd44572f7f6c3cc1dfbac05fb86f6081
输入:
out = [{'description': 'บริษัทไปรษณีย์ไทย',
'vertices': [(528, 202), (741, 202), (741, 222), (528, 222)]},
{'description': 'จํากัด',
'vertices': [(754, 204), (809, 204), (809, 222), (754, 222)]},
{'description': 'RP',
'vertices': [(729, 1072), (758, 1072), (758, 1091), (729, 1091)]},
{'description': '8147',
'vertices': [(768, 1072), (822, 1072), (822, 1092), (768, 1092)]},
{'description': '3609',
'vertices': [(834, 1073), (889, 1073), (889, 1093), (834, 1093)]},
{'description': '7',
'vertices': [(900, 1073), (911, 1073), (911, 1092), (900, 1092)]}
]
我假设,4个元组分别代表左上、右上、右下、左下坐标的x、y坐标(按顺序)
首先,我们需要找到所有在x方向上接近,并且在y方向上几乎相同(位置相同高度)的bbox对。 N.B:如果有遗漏,您可能需要调整两个阈值。
import numpy as np
pairs = []
threshold_y = 4 # height threshold
threshold_x = 20 # x threshold
for i in range(len(out)):
for j in range(i+1, len(out)):
left_upi, right_upi, right_lowi, left_lowi = out[i]['vertices']
left_upj, right_upj, right_lowj, left_lowj = out[j]['vertices']
# first of all, they should be in the same height range, starting Y axis should be almost same
# their starting x axis is close upto a threshold
cond1 = (abs(left_upi[1] - left_upj[1]) < threshold_y)
cond2 = (abs(right_upi[0] - left_upj[0]) < threshold_x)
cond3 = (abs(right_upj[0] - left_upi[0]) < threshold_x)
if cond1 and (cond2 or cond3):
pairs.append([i,j])
输出:
pairs
[[0, 1], [2, 3], [3, 4], [4, 5]]
- 现在,我们只有对,但我们还需要找到所有连通分量,例如,我们知道 0、1 在一个分量中,而 2、3、4、5 在另一个分量中。 (通常,图算法最适合这个任务,但为了简单起见,我做了一个迭代搜索)
merged_pairs = []
for i in range(len(pairs)):
cur_set = set()
p = pairs[i]
done = False
for k in range(len(merged_pairs)):
if p[0] in merged_pairs[k]:
merged_pairs[k].append(p[1])
done = True
if p[1] in merged_pairs[k]:
merged_pairs[k].append(p[0])
done = True
if done:
continue
cur_set.add(p[0])
cur_set.add(p[1])
match_idx = []
while True:
num_match = 0
for j in range(i+1, len(pairs)):
p2 = pairs[j]
if j not in match_idx and (p2[0] in cur_set or p2[1] in cur_set):
cur_set.add(p2[0])
cur_set.add(p2[1])
num_match += 1
match_idx.append(j)
if num_match == 0:
break
merged_pairs.append(list(cur_set))
merged_pairs = [list(set(a)) for a in merged_pairs]
输出:
merged_pairs
[[0, 1], [2, 3, 4, 5]]
替代 networkx 解决方案:
(如果您不介意使用额外的 networkx
导入,则更短)
import networkx as nx
g = nx.Graph()
g.add_edges_from([[0, 1], [2, 3], [3, 4], [4, 5]]) # pass pairs here
gs = [list(a) for a in list(nx.connected_components(g))] # get merged pairs here
print(gs)
[[0, 1], [2, 3, 4, 5]]
- 现在,我们有了所有的连接组件,我们可以根据它们的起始 x 坐标对它们进行排序,并合并边界框。
# for connected components, sort them according to x-axis and merge
out_final = []
INF = 999999999 # a large number greater than any co-ordinate
for idxs in merged_pairs:
c_bbox = []
for i in idxs:
c_bbox.append(out[i])
sorted_x = sorted(c_bbox, key = lambda x: x['vertices'][0][0])
new_sol = {}
new_sol['description'] = ''
new_sol['vertices'] = [[INF, INF], [-INF, INF], [-INF, -INF], [INF, -INF]]
for k in sorted_x:
new_sol['description'] += k['description']
new_sol['vertices'][0][0] = min(new_sol['vertices'][0][0], k['vertices'][0][0])
new_sol['vertices'][0][1] = min(new_sol['vertices'][0][1], k['vertices'][0][1])
new_sol['vertices'][1][0] = max(new_sol['vertices'][1][0], k['vertices'][1][0])
new_sol['vertices'][1][1] = min(new_sol['vertices'][1][1], k['vertices'][1][1])
new_sol['vertices'][2][0] = max(new_sol['vertices'][2][0], k['vertices'][2][0])
new_sol['vertices'][2][1] = max(new_sol['vertices'][2][1], k['vertices'][2][1])
new_sol['vertices'][3][0] = min(new_sol['vertices'][3][0], k['vertices'][3][0])
new_sol['vertices'][3][1] = max(new_sol['vertices'][3][1], k['vertices'][3][1])
out_final.append(new_sol)
final_out:
out_final
[{'description': 'บริษัทไปรษณีย์ไทยจํากัด',
'vertices': [[528, 202], [809, 202], [809, 222], [528, 222]]},
{'description': 'RP814736097',
'vertices': [[729, 1072], [911, 1072], [911, 1093], [729, 1093]]}]