是否可以针对图像具有文本数据的单个 class 训练 YOLO（任何版本）。（找到方程的区域）

Question

我想知道是否可以在文本数据上训练 YOLO（任何版本，特别是准确率而非速度的版本）。我想要做的是在文本图像中找到 区域，其中存在任何方程 .

例如，我想在 this image 中找到 2 个感兴趣的灰色区域，以便我可以勾勒出轮廓并最终分别裁剪方程。

我问这个问题是因为：首先我还没有找到YOLO用于文本数据的地方。其次，与 (416,416) 不同的是，我们如何针对低分辨率进行自定义，因为所有图像都经过裁剪或横向处理，大部分采用 (W=2H) 格式。

我已经为文本数据实现了 YOLO-V3 版本，但使用的是基本上用于 CPU 的 OpenCv。我想从头开始训练模型。

请帮忙。任何 Keras、Tensorflow 或 PyTorch 都可以。

这是我在 OpenCv 中实现的代码。

net = cv2.dnn.readNet(PATH+"yolov3.weights", PATH+"yolov3.cfg") # build the model. NOTE: This will only use CPU
layer_names = net.getLayerNames() # get all the layer names from the network 254 layers in the network
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()] # output layer is the 
# 3 output layers in otal


blob = cv2.dnn.blobFromImage(image=img, scalefactor=0.00392, size=(416,416), mean=(0, 0, 0), swapRB=True,)
#  output as numpy array of (1,3,416,416). If you need to change the shape, change it in the config file too
# swap BGR to RGB, scale it to a threshold, resize, subtract it from the mean of 0 for all the RGB values

net.setInput(blob) 

outs = net.forward(output_layers) # list of 3 elements for each channel

class_ids = [] # id of classes
confidences = [] # to store all the confidence score of objects present in bounding boxes if 0, no object is present
boxes = [] # to store all the boxes

for out in outs: # get all channels one by one
    for detection in out: # get detection one by one
        scores = detection[5:] # prob of 80 elements if the object(s) is/are inside the box and if yes, with what prob
        
        class_id = np.argmax(scores) # Which class is dominating inside the list
        confidence = scores[class_id]
        if confidence > 0.1: # consider only those boxes which have a prob of having an object > 0.55
            
            # grid coordinates
            center_x = int(detection[0] * width) # centre X of grid
            center_y = int(detection[1] * height) # Center Y of grid
            w = int(detection[2] * width) # width
            h = int(detection[3] * height) # height
            
            # Rectangle coordinates
            x = int(center_x - w / 2)
            y = int(center_y - h / 2)
            
            boxes.append([x, y, w, h]) # get all the bounding boxes
            confidences.append(float(confidence)) # get all the confidence score
            class_ids.append(class_id) # get all the clas ids

Answer 1

作为对象检测器Yolo只能用于特定文本检测，不能用于检测图像中可能存在的任何文本。

例如 Yolo 可以像这样训练来进行基于文本的徽标检测：

I want to find the 2 of the Gray regions of interest in this image so that I can outline and eventually, crop the equations separately.

您的问题陈述谈到检测图像中存在的任何方程式（数学公式），因此无法单独使用 Yolo 来完成。我认为 mathpix 与您的用例相似。他们将使用 OCR (Optical Character Recognition) 系统训练并针对他们的用例进行微调。

最终做类似 mathpix 的事情，OCR 为您的用例定制的系统就是您所需要的。为此，不会有任何现成的现成解决方案。你必须建造一个。

建议的方法：

注意： Tesseract 不能直接使用，因为它是一个为读取任何字符而训练的预训练模型。您可以参考第二篇论文来训练 tesseract 以适应您的用例。

要对 OCR 有所了解，您可以阅读它 here。

编辑：

所以想法是构建您自己的 OCR 来检测构成 equation/math 公式的内容，而不是检测每个字符。您需要有标记方程式的数据集。基本上你寻找带有数学符号的区域（比如求和、积分等）。

一些训练您自己的 OCR 的教程：

So idea is that you follow these tutorials to get to know how to train and build your OCR for any use case and then you read research papers I mentioned above and also some of the basic ideas I gave above to build OCR towards your use case.

是否可以针对图像具有文本数据的单个 class 训练 YOLO（任何版本）。（找到方程的区域）

Is it possible to train YOLO (any version) for a single class where the image has text data. (find region of equations)

object-recognition

deep-learning

keras

tensorflow

yolo

是否可以针对图像具有文本数据的单个 class 训练 YOLO（任何版本）。 （找到方程的区域）

Is it possible to train YOLO (any version) for a single class where the image has text data. (find region of equations)

object-recognition

deep-learning

keras

tensorflow

yolo

是否可以针对图像具有文本数据的单个 class 训练 YOLO（任何版本）。（找到方程的区域）