从复杂的 UI 中提取 Semi-structured 文本（高尔夫模拟器）

Question

我对 OCR、OpenCV、Tesseract 等领域还很陌生，希望能得到一些建议或推动我正在从事的项目的正确方向。就上下文而言，我在由 Full Swing Golf 提供支持的室内模拟器上练习高尔夫。我的目标是构建一个应用程序（最好是 iphone，但桌面也可以），它能够获取模拟器提供的数据并按照我的意愿进行处理。整体工作流程如下所示：

设置iPhone或笔记本电脑摄像头观看模拟器屏幕。
击球
显示的统计信息屏幕看起来或多或少类似于：

检测到统计界面已经显示，抓取所有相关数据：

| Distance | Launch | Back Spin | Club Speed | Carry | To Pin | Direction | Ball Speed | Side Spin | Club Face | Club Path |
|----------|--------|-----------|------------|-------|--------|-----------|------------|-----------|-----------|-----------|
| 345      | 13     | 3350      | 135        | 335   | 80     | 2.4       | 190        | 350       | 4.3       | 1.6       |

5-?: 将数据保存到我的应用程序，随着时间的推移跟踪它等...

到目前为止的尝试次数：

OpenCV 的 matchTemplate 似乎是一种查找图像中所有标题（距离、发射等...）的简单方法，并且当图像和模板都是完美的分辨率。然而，由于这将是一个 iPhone 应用程序，质量不是我能真正保证的（在合理范围内）。此外，屏幕几乎永远不会像上面显示的那样 straight-on 。最有可能的是，相机会偏向一边，我们将不得不相应地de-skew。我尝试使用下图处理我的校正逻辑，但无济于事：

由于匹配模板存在上述问题，事实证明通过 getPerspectiveTransform 和 warpPerspective 找到参考点以消除偏移非常困难。

我还尝试使用类似于以下的代码动态调整比例：

def findTemplateLocation(image_path):
    template = cv2.imread(image_path)
    template = cv2.cvtColor(template, cv2.COLOR_BGR2GRAY)

    w, h = template.shape[::-1]
    threshold = 0.65
    loc = []

    for scale in np.linspace(0.1, 2, 20)[::-1]:
        resized = imutils.resize(template, width=int(template.shape[1] * scale))
        w, h = resized.shape[::-1]
        res = cv2.matchTemplate(image_gray, resized, cv2.TM_CCOEFF_NORMED)

        loc = np.where(res >= threshold)
        if len(list(zip(*loc[::-1]))) > 0:
            break

    if loc and len(list(zip(*loc[::-1]))) > 0:
        adjusted_w = int(w/scale)
        adjusted_h = int(h/scale)
        print(str(adjusted_w) + " " + str(adjusted_h) + " " + str(scale))

        ret = []
        for pt in zip(*loc[::-1]):
            ret.append({'width': w, 'height': h, 'location': pt})

        return ret

    return None

这仍然是 returns 大量误报。

我希望得到一些关于如何从头开始解决这个问题的建议。我对任何语言/工作流程都持开放态度。

如果我确实走在正确的轨道上，我当前的代码在 https://gist.github.com/naderhen/9ec8d45f13d92507131d5bce0e84fad8 。非常感谢有关最佳后续步骤的任何建议。

感谢您提供的任何帮助！

编辑：其他资源

这个周末我在室内模拟器上上传了一些视频和静态照片：https://www.dropbox.com/sh/5vub2mi4rvunyaw/AAAY1_7Q_WBV4JvmDD0dEiTDa?dl=0

我尝试了一些不同的角度、不同的照明等。如果我可以提供任何其他可能有帮助的资源，请告诉我。

Answer 1

所以，我尝试了两种不同的方法：

轮廓检测 - 这似乎是最明显的方法，因为 统计屏幕 是主要部分图像并存在于您的所有图像中。虽然它确实适用于三幅图像中的两幅，但它可能对参数不是很稳健。以下是我为轮廓尝试的步骤：

首先，获取灰度图像或采用 HSV 中的 Value 通道之一。然后，使用 Otsu or Adaptive Thresholding 对图像进行阈值处理。在尝试了很多相关参数之后，我得到了令人满意的结果，这基本上意味着漂亮的 whole 黑底白字统计屏幕。在此之后，像这样对等高线进行排序：
```
contours = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)[1]
# Sort the contours to avoid unnecessary comparison in the for loop below
cntsSorted = sorted(contours, key=lambda x: cv2.contourArea(x), reverse=True)

for cnt in cntsSorted[0:20]:
    peri = cv2.arcLength(cnt, True)
    approx = cv2.approxPolyDP(cnt, 0.04 * peri, True)
    if len(approx) == 4 and peri > 10000:
        cv2.drawContours(sorted_image, cnt, -1, (0, 255, 0), 10)
```
特征检测和匹配：由于使用轮廓不够稳健，我尝试了另一种方法，该方法与您的问题类似。这种方法相当稳健，速度更快（我在 2 年前在 android phone 上尝试过这种方法，对于 1280 x 760 图像，它可以在不到一秒的时间内完成工作）。但是，在尝试了您的工作案例之后，我发现您的图像非常模糊。我的意思是，你的问题中有两张图像 非常相似的初选 并且它适用于此但是你 post 在评论中编辑的图像是非常与这些不同，因此找不到合适数量的良好匹配（在我的情况下至少有 10 个）。如果您可以 post 一组您 实际上 会遇到的漂亮图像，我将用我在新图像集上的结果更新此答案。更重要的是，场景的图像显然在视角上发生了变化，假设您能够获得非常好的源图像（作为您问题中的第一个图像），这应该不是问题。但是，照明条件的变化可能会很痛苦。我建议使用不同的颜色空间，例如 HSV, Lab and Luv instead of BGR. Here 是您可以找到如何实现您自己的特征匹配器的工作示例的地方。根据您使用的 OpenCV 版本，需要进行一些代码更改，但我相信您可以找到解决方案（我找到了 ;)）。

一个很好的例子：

一些建议：

尝试为您使用的图像获取尽可能干净的图像以与其他图像匹配（在我的例子中是您的第一张图像）。希望这会要求您做更少的处理。
在找到关键点之前尝试使用 unsharp mask。
我的结果来自使用 ORB。您也可以尝试使用其他 detectors/descriptors，例如 SURF、SIFT 和 FAST。

最后，您的模板匹配方法应该适用于仅缩放而不是视角发生变化的情况。

希望对您有所帮助！如果您有任何其他问题，请写评论 and/or 当您准备好良好的图像集（搓手掌）时。干杯!

编辑 1：这是我在 Opencv 3.4.3 和 Python 3.4

中用于特征检测和匹配的代码

def unsharp_mask(im):
    # This is used to sharpen images
    gaussian_3 = cv2.GaussianBlur(im, (3, 3), 3.0)
    return cv2.addWeighted(im, 2.0, gaussian_3, -1.0, 0, im)

def screen_finder2(image, source, num=0):
    def resize(im, new_width):
        r = float(new_width) / im.shape[1]
        dim = (new_width, int(im.shape[0] * r))
        return cv2.resize(im, dim, interpolation=cv2.INTER_AREA)
    width = 300
    source = resize(source, new_width=width)
    image = resize(image, new_width=width)

    hsv = cv2.cvtColor(image, cv2.COLOR_BGR2LUV)
    image, u, v = cv2.split(hsv)

    hsv = cv2.cvtColor(source, cv2.COLOR_BGR2LUV)
    source, u, v = cv2.split(hsv)

    MIN_MATCH_COUNT = 10
    orb = cv2.ORB_create()
    kp1, des1 = orb.detectAndCompute(image, None)
    kp2, des2 = orb.detectAndCompute(source, None)

    flann = cv2.DescriptorMatcher_create(cv2.DescriptorMatcher_FLANNBASED)
    # Without the below 2 lines, matching doesn't work
    des1 = np.asarray(des1, dtype=np.float32)
    des2 = np.asarray(des2, dtype=np.float32)

    matches = flann.knnMatch(des1, des2, k=2)

    # store all the good matches as per Lowe's ratio test
    good = []
    for m, n in matches:
        if m.distance < 0.7 * n.distance:
            good.append(m)

    if len(good) >= MIN_MATCH_COUNT:
        src_pts = np.float32([kp1[m.queryIdx].pt for m in good]).reshape(-1, 
                                                                         1, 2)
        dst_pts = np.float32([kp2[m.trainIdx].pt for m in good]).reshape(-1, 
                                                                         1, 2)

        M, mask = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC, 5.0)
        matchesMask = mask.ravel().tolist()

        h,w = image.shape
        pts = np.float32([[0, 0], [0, h-1], [w-1, h-1], [w-1, 0]]).reshape(-1, 
                                                                         1, 2)
        dst = cv2.perspectiveTransform(pts, M)
        source_bgr = cv2.cvtColor(source, cv2.COLOR_GRAY2BGR)
        img2 = cv2.polylines(source_bgr, [np.int32(dst)], True, (0,0,255), 3, 
                             cv2.LINE_AA)
        cv2.imwrite("out"+str(num)+".jpg", img2)
    else:
        print("Not enough matches." + str(len(good)))
        matchesMask = None

    draw_params = dict(matchColor=(0, 255, 0), # draw matches in green color
                       singlePointColor=None,
                       matchesMask=matchesMask, # draw only inliers
                       flags=2)
    img3 = cv2.drawMatches(image, kp1, source, kp2, good, None, **draw_params)
    cv2.imwrite("ORB"+str(num)+".jpg", img3)


match_image = unsharp_mask(cv2.imread("source.jpg"))
image_1 = unsharp_mask(cv2.imread("Screen_1.jpg"))
screen_finder2(match_image, image_1, num=1)

从复杂的 UI 中提取 Semi-structured 文本（高尔夫模拟器）

Extracting Semi-structured Text from a complex UI (Golf Simulator)

ocr

opencv

tesseract

image-processing

python-tesseract