使用 YOLO 或其他图像识别技术来识别图像中存在的所有字母数字文本

Question

我有多个图片图表，所有图片都包含字母数字字符的标签，而不仅仅是文本标签本身。我希望我的 YOLO 模型能够识别其中存在的所有数字和字母数字字符。

我如何训练我的 YOLO 模型来做同样的事情。可以在此处找到数据集。 https://drive.google.com/open?id=1iEkGcreFaBIJqUdAADDXJbUrSj99bvoi

例如：查看边界框。我希望 YOLO 检测文本出现的位置。不过目前没必要识别里面的文字。

对于这些类型的图像也需要做同样的事情

图片可以下载here

这是我使用 opencv 尝试过的方法，但它不适用于数据集中的所有图像。

import cv2
import numpy as np
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Users\HPO2KOR\AppData\Local\Tesseract-OCR\tesseract.exe"

image = cv2.imread(r'C:\Users\HPO2KOR\Desktop\Work\venv\Patent\PARTICULATE DETECTOR\PD4.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
clean = thresh.copy()

horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (15,1))
detect_horizontal = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)
cnts = cv2.findContours(detect_horizontal, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    cv2.drawContours(clean, [c], -1, 0, 3)

vertical_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1,30))
detect_vertical = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, vertical_kernel, iterations=2)
cnts = cv2.findContours(detect_vertical, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    cv2.drawContours(clean, [c], -1, 0, 3)

cnts = cv2.findContours(clean, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    area = cv2.contourArea(c)
    if area < 100:
        cv2.drawContours(clean, [c], -1, 0, 3)
    elif area > 1000:
        cv2.drawContours(clean, [c], -1, 0, -1)
    peri = cv2.arcLength(c, True)
    approx = cv2.approxPolyDP(c, 0.02 * peri, True)
    x,y,w,h = cv2.boundingRect(c)
    if len(approx) == 4:
        cv2.rectangle(clean, (x, y), (x + w, y + h), 0, -1)

open_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2,2))
opening = cv2.morphologyEx(clean, cv2.MORPH_OPEN, open_kernel, iterations=2)
close_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,2))
close = cv2.morphologyEx(opening, cv2.MORPH_CLOSE, close_kernel, iterations=4)
cnts = cv2.findContours(close, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    x,y,w,h = cv2.boundingRect(c)
    area = cv2.contourArea(c)
    if area > 500:
        ROI = image[y:y+h, x:x+w]
        ROI = cv2.GaussianBlur(ROI, (3,3), 0)
        data = pytesseract.image_to_string(ROI, lang='eng',config='--psm 6')
        if data.isalnum():
            cv2.rectangle(image, (x, y), (x + w, y + h), (36,255,12), 2)
            print(data)

cv2.imwrite('image.png', image)
cv2.imwrite('clean.png', clean)
cv2.imwrite('close.png', close)
cv2.imwrite('opening.png', opening)
cv2.waitKey()

是否有任何模型或任何 opencv 技术或一些预训练模型可以为我做同样的事情？我只需要图像中所有字母数字字符周围的边界框。之后，我需要确定其中存在的内容。不过第二部分目前并不重要。

Answer 1

您所描述的似乎是 OCR（Optical character recognition). One OCR engine I know of is tesseract, although there is also this one from IBM 和其他。

由于 YOLO 最初是为一项非常不同的任务训练的，因此要将其用于本地化文本可能需要从头开始重新训练。可以尝试使用现有包（根据您的特定设置进行调整）来获取基本事实（尽管值得记住的是，该模型通常最多只能与基本事实一样好）。或者，也许更容易地，生成用于训练的合成数据（即在您选择的位置添加文本到现有绘图，然后训练以对其进行本地化）。

或者，如果您所有的目标图像的结构都与上述类似，则可以尝试使用经典的 CV 启发式方法创建基本事实，就像您在上面所做的那样 separate/segment 输出符号，然后使用 CNN 进行分类在 MNIST 或类似系统上训练以确定给定的 blob 是否包含符号。

对于您选择 YOLO 的情况 - python 中已有实现，例如我在 this one 方面有一些经验——用你自己的基本事实设置训练应该相当简单。

最后，如果使用 YOLO 或 CNN 本身不是目标，而只是解决方案，则上述任何 "ground truth" 都可以直接用作解决方案，而不是用于训练模型。

希望我正确理解了你的问题

Answer 2

一种可能的方法是使用基于 Zhou 等人 2017 年论文 EAST: An Efficient and Accurate Scene Text Detector. The model was originally trained for detecting text in natural scene images but it may be possible to apply it on diagram images. EAST is quite robust and is capable of detecting blurred or reflective text. Here is a modified version of Adrian Rosebrock's implementation of EAST 的 EAST（高效准确场景文本）深度学习文本检测器。我们可以尝试在执行文本检测之前尽可能多地移除图像上的 non-text 个对象，而不是直接在图像上应用文本检测器。这个想法是在应用检测之前删除水平线、垂直线和 non-text 轮廓（曲线、对角线、圆形）。以下是您的一些图片的结果：

输入 -> Non-text 绿色轮廓

结果

其他图片

执行文本检测所必需的预训练 frozen_east_text_detection.pb 模型可以在 found here. Although the model catches most of the text, the results are not 100% accurate and has occasional false positives probably due to how it was trained on natural scene images. To obtain more accurate results you would probably have to train your own custom model. But if you want a decent out-of-the-box solution then this should work you. Check out Adrian's OpenCV Text Detection (EAST text detector) 博客 post 中获得对 EAST 文本检测器更全面的解释。

代码

from imutils.object_detection import non_max_suppression
import numpy as np
import cv2

def EAST_text_detector(original, image, confidence=0.25):
    # Set the new width and height and determine the changed ratio
    (h, W) = image.shape[:2]
    (newW, newH) = (640, 640)
    rW = W / float(newW)
    rH = h / float(newH)

    # Resize the image and grab the new image dimensions
    image = cv2.resize(image, (newW, newH))
    (h, W) = image.shape[:2]

    # Define the two output layer names for the EAST detector model that
    # we are interested -- the first is the output probabilities and the
    # second can be used to derive the bounding box coordinates of text
    layerNames = [
        "feature_fusion/Conv_7/Sigmoid",
        "feature_fusion/concat_3"]

    net = cv2.dnn.readNet('frozen_east_text_detection.pb')

    # Construct a blob from the image and then perform a forward pass of
    # the model to obtain the two output layer sets
    blob = cv2.dnn.blobFromImage(image, 1.0, (W, h), (123.68, 116.78, 103.94), swapRB=True, crop=False)
    net.setInput(blob)
    (scores, geometry) = net.forward(layerNames)

    # Grab the number of rows and columns from the scores volume, then
    # initialize our set of bounding box rectangles and corresponding
    # confidence scores
    (numRows, numCols) = scores.shape[2:4]
    rects = []
    confidences = []

    # Loop over the number of rows
    for y in range(0, numRows):
        # Extract the scores (probabilities), followed by the geometrical
        # data used to derive potential bounding box coordinates that
        # surround text
        scoresData = scores[0, 0, y]
        xData0 = geometry[0, 0, y]
        xData1 = geometry[0, 1, y]
        xData2 = geometry[0, 2, y]
        xData3 = geometry[0, 3, y]
        anglesData = geometry[0, 4, y]

        # Loop over the number of columns
        for x in range(0, numCols):
            # If our score does not have sufficient probability, ignore it
            if scoresData[x] < confidence:
                continue

            # Compute the offset factor as our resulting feature maps will
            # be 4x smaller than the input image
            (offsetX, offsetY) = (x * 4.0, y * 4.0)

            # Extract the rotation angle for the prediction and then
            # compute the sin and cosine
            angle = anglesData[x]
            cos = np.cos(angle)
            sin = np.sin(angle)

            # Use the geometry volume to derive the width and height of
            # the bounding box
            h = xData0[x] + xData2[x]
            w = xData1[x] + xData3[x]

            # Compute both the starting and ending (x, y)-coordinates for
            # the text prediction bounding box
            endX = int(offsetX + (cos * xData1[x]) + (sin * xData2[x]))
            endY = int(offsetY - (sin * xData1[x]) + (cos * xData2[x]))
            startX = int(endX - w)
            startY = int(endY - h)

            # Add the bounding box coordinates and probability score to
            # our respective lists
            rects.append((startX, startY, endX, endY))
            confidences.append(scoresData[x])

    # Apply non-maxima suppression to suppress weak, overlapping bounding
    # boxes
    boxes = non_max_suppression(np.array(rects), probs=confidences)

    # Loop over the bounding boxes
    for (startX, startY, endX, endY) in boxes:
        # Scale the bounding box coordinates based on the respective
        # ratios
        startX = int(startX * rW)
        startY = int(startY * rH)
        endX = int(endX * rW)
        endY = int(endY * rH)

        # Draw the bounding box on the image
        cv2.rectangle(original, (startX, startY), (endX, endY), (36, 255, 12), 2)
    return original

# Convert to grayscale and Otsu's threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
clean = thresh.copy()

# Remove horizontal lines
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (15,1))
detect_horizontal = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)
cnts = cv2.findContours(detect_horizontal, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    cv2.drawContours(clean, [c], -1, 0, 3)

# Remove vertical lines
vertical_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1,30))
detect_vertical = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, vertical_kernel, iterations=2)
cnts = cv2.findContours(detect_vertical, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    cv2.drawContours(clean, [c], -1, 0, 3)

# Remove non-text contours (curves, diagonals, circlar shapes)
cnts = cv2.findContours(clean, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    area = cv2.contourArea(c)
    if area > 1500:
        cv2.drawContours(clean, [c], -1, 0, -1)
    peri = cv2.arcLength(c, True)
    approx = cv2.approxPolyDP(c, 0.02 * peri, True)
    x,y,w,h = cv2.boundingRect(c)
    if len(approx) == 4:
        cv2.rectangle(clean, (x, y), (x + w, y + h), 0, -1)

# Bitwise-and with original image to remove contours
filtered = cv2.bitwise_and(image, image, mask=clean)
filtered[clean==0] = (255,255,255)

# Perform EAST text detection
result = EAST_text_detector(image, filtered)

cv2.imshow('filtered', filtered)
cv2.imshow('result', result)
cv2.waitKey()

Answer 3

为了方便起见，我想添加包 keras_ocr。它可以很容易地用 pip 安装，并且基于 CRAFT 文本检测器，如果我没记错的话，它比 EAST 检测器更新一点。

除了检测之外，它还进行了一些 OCR！结果如下所示，将其视为替代方案，可能比接受的答案更容易实施。

使用 YOLO 或其他图像识别技术来识别图像中存在的所有字母数字文本

Using YOLO or other image recognition techniques to identify all alphanumeric text present in images

python

opencv

machine-learning

deep-learning

yolo