是否可以针对图像具有文本数据的单个 class 训练 YOLO(任何版本)。 (找到方程的区域)
Is it possible to train YOLO (any version) for a single class where the image has text data. (find region of equations)
我想知道是否可以在文本数据上训练 YOLO(任何版本,特别是准确率而非速度的版本)。我想要做的是在文本图像中找到 区域,其中存在任何方程 .
例如,我想在 this image 中找到 2 个感兴趣的灰色区域,以便我可以勾勒出轮廓并最终分别裁剪方程。
我问这个问题是因为:
首先我还没有找到YOLO用于文本数据的地方。
其次,与 (416,416) 不同的是,我们如何针对低分辨率进行自定义,因为所有图像都经过裁剪或横向处理,大部分采用 (W=2H) 格式。
我已经为文本数据实现了 YOLO-V3 版本,但使用的是基本上用于 CPU 的 OpenCv。我想从头开始训练模型。
请帮忙。任何 Keras、Tensorflow 或 PyTorch 都可以。
这是我在 OpenCv 中实现的代码。
net = cv2.dnn.readNet(PATH+"yolov3.weights", PATH+"yolov3.cfg") # build the model. NOTE: This will only use CPU
layer_names = net.getLayerNames() # get all the layer names from the network 254 layers in the network
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()] # output layer is the
# 3 output layers in otal
blob = cv2.dnn.blobFromImage(image=img, scalefactor=0.00392, size=(416,416), mean=(0, 0, 0), swapRB=True,)
# output as numpy array of (1,3,416,416). If you need to change the shape, change it in the config file too
# swap BGR to RGB, scale it to a threshold, resize, subtract it from the mean of 0 for all the RGB values
net.setInput(blob)
outs = net.forward(output_layers) # list of 3 elements for each channel
class_ids = [] # id of classes
confidences = [] # to store all the confidence score of objects present in bounding boxes if 0, no object is present
boxes = [] # to store all the boxes
for out in outs: # get all channels one by one
for detection in out: # get detection one by one
scores = detection[5:] # prob of 80 elements if the object(s) is/are inside the box and if yes, with what prob
class_id = np.argmax(scores) # Which class is dominating inside the list
confidence = scores[class_id]
if confidence > 0.1: # consider only those boxes which have a prob of having an object > 0.55
# grid coordinates
center_x = int(detection[0] * width) # centre X of grid
center_y = int(detection[1] * height) # Center Y of grid
w = int(detection[2] * width) # width
h = int(detection[3] * height) # height
# Rectangle coordinates
x = int(center_x - w / 2)
y = int(center_y - h / 2)
boxes.append([x, y, w, h]) # get all the bounding boxes
confidences.append(float(confidence)) # get all the confidence score
class_ids.append(class_id) # get all the clas ids
作为对象检测器Yolo
只能用于特定文本检测,不能用于检测图像中可能存在的任何文本。
例如 Yolo
可以像这样训练来进行基于文本的徽标检测:
I want to find the 2 of the Gray regions of interest in this image so
that I can outline and eventually, crop the equations separately.
您的问题陈述谈到检测图像中存在的任何方程式(数学公式),因此无法单独使用 Yolo
来完成。我认为 mathpix 与您的用例相似。他们将使用 OCR
(Optical Character Recognition
) 系统训练并针对他们的用例进行微调。
最终做类似 mathpix
的事情,OCR
为您的用例定制的系统就是您所需要的。为此,不会有任何现成的现成解决方案。你必须建造一个。
建议的方法:
- Mathematical Formula Detection in Heterogeneous Document Images
- A Simple Equation Region Detector for Printed Document Images in Tesseract
注意: Tesseract 不能直接使用,因为它是一个为读取任何字符而训练的预训练模型。您可以参考第二篇论文来训练 tesseract 以适应您的用例。
要对 OCR 有所了解,您可以阅读它 here。
编辑:
所以想法是构建您自己的 OCR 来检测构成 equation/math 公式的内容,而不是检测每个字符。您需要有标记方程式的数据集。基本上你寻找带有数学符号的区域(比如求和、积分等)。
一些训练您自己的 OCR 的教程:
- Tesseract training guide
- Creating OCR pipeline using CV and DL
- Build OCR pipeline
- Build Your OCR
- Attention OCR
So idea is that you follow these tutorials to get to know how to train
and build your OCR
for any use case and then you read research papers
I mentioned above and also some of the basic ideas I gave above to
build OCR towards your use case.
我想知道是否可以在文本数据上训练 YOLO(任何版本,特别是准确率而非速度的版本)。我想要做的是在文本图像中找到 区域,其中存在任何方程 .
例如,我想在 this image 中找到 2 个感兴趣的灰色区域,以便我可以勾勒出轮廓并最终分别裁剪方程。
我问这个问题是因为: 首先我还没有找到YOLO用于文本数据的地方。 其次,与 (416,416) 不同的是,我们如何针对低分辨率进行自定义,因为所有图像都经过裁剪或横向处理,大部分采用 (W=2H) 格式。
我已经为文本数据实现了 YOLO-V3 版本,但使用的是基本上用于 CPU 的 OpenCv。我想从头开始训练模型。
请帮忙。任何 Keras、Tensorflow 或 PyTorch 都可以。
这是我在 OpenCv 中实现的代码。
net = cv2.dnn.readNet(PATH+"yolov3.weights", PATH+"yolov3.cfg") # build the model. NOTE: This will only use CPU
layer_names = net.getLayerNames() # get all the layer names from the network 254 layers in the network
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()] # output layer is the
# 3 output layers in otal
blob = cv2.dnn.blobFromImage(image=img, scalefactor=0.00392, size=(416,416), mean=(0, 0, 0), swapRB=True,)
# output as numpy array of (1,3,416,416). If you need to change the shape, change it in the config file too
# swap BGR to RGB, scale it to a threshold, resize, subtract it from the mean of 0 for all the RGB values
net.setInput(blob)
outs = net.forward(output_layers) # list of 3 elements for each channel
class_ids = [] # id of classes
confidences = [] # to store all the confidence score of objects present in bounding boxes if 0, no object is present
boxes = [] # to store all the boxes
for out in outs: # get all channels one by one
for detection in out: # get detection one by one
scores = detection[5:] # prob of 80 elements if the object(s) is/are inside the box and if yes, with what prob
class_id = np.argmax(scores) # Which class is dominating inside the list
confidence = scores[class_id]
if confidence > 0.1: # consider only those boxes which have a prob of having an object > 0.55
# grid coordinates
center_x = int(detection[0] * width) # centre X of grid
center_y = int(detection[1] * height) # Center Y of grid
w = int(detection[2] * width) # width
h = int(detection[3] * height) # height
# Rectangle coordinates
x = int(center_x - w / 2)
y = int(center_y - h / 2)
boxes.append([x, y, w, h]) # get all the bounding boxes
confidences.append(float(confidence)) # get all the confidence score
class_ids.append(class_id) # get all the clas ids
作为对象检测器Yolo
只能用于特定文本检测,不能用于检测图像中可能存在的任何文本。
例如 Yolo
可以像这样训练来进行基于文本的徽标检测:
I want to find the 2 of the Gray regions of interest in this image so that I can outline and eventually, crop the equations separately.
您的问题陈述谈到检测图像中存在的任何方程式(数学公式),因此无法单独使用 Yolo
来完成。我认为 mathpix 与您的用例相似。他们将使用 OCR
(Optical Character Recognition
) 系统训练并针对他们的用例进行微调。
最终做类似 mathpix
的事情,OCR
为您的用例定制的系统就是您所需要的。为此,不会有任何现成的现成解决方案。你必须建造一个。
建议的方法:
- Mathematical Formula Detection in Heterogeneous Document Images
- A Simple Equation Region Detector for Printed Document Images in Tesseract
注意: Tesseract 不能直接使用,因为它是一个为读取任何字符而训练的预训练模型。您可以参考第二篇论文来训练 tesseract 以适应您的用例。
要对 OCR 有所了解,您可以阅读它 here。
编辑:
所以想法是构建您自己的 OCR 来检测构成 equation/math 公式的内容,而不是检测每个字符。您需要有标记方程式的数据集。基本上你寻找带有数学符号的区域(比如求和、积分等)。
一些训练您自己的 OCR 的教程:
- Tesseract training guide
- Creating OCR pipeline using CV and DL
- Build OCR pipeline
- Build Your OCR
- Attention OCR
So idea is that you follow these tutorials to get to know how to train and build your
OCR
for any use case and then you read research papers I mentioned above and also some of the basic ideas I gave above to build OCR towards your use case.