是否可以在通过 pytesseract ocr 模块传递图像之前检查图像的方向

Is it possible to check orientation of an image before passing it through pytesseract ocr module

对于我当前的 ocr 项目,我尝试使用 tesserect 使用 python 封面 pytesseract 将图像转换为文本文件。到目前为止,我只是将方向正确的图像传递到我的模块中,因为它能够正确地找出该图像中的文本。但是现在当我传递旋转图像时,它甚至无法识别一个单词。所以为了获得好的结果,我只需要以正确的方向传递图像。 现在我想知道在将图像传递到 ocr 模块之前是否有任何方法可以确定图像的方向。请让我知道我可以使用什么方法来进行方向检查。

这是我用来进行转换的方法:

def images_to_text(testImg):
    print('Reading images form the directory..........')
    dataFile=[]
    for filename in os.listdir(testImg):
        os.chdir(testImg)
        # Define config parameters.
        # '-l eng'  for using the English language 
        # '--oem 1' for using LSTM OCR Engine
        config = ('-l eng --oem 1 --psm 3')
        # Read image from disk
        im = cv2.imread(str(filename), cv2.IMREAD_COLOR)
        # Run tesseract OCR on image
        text = pytesseract.image_to_string(im, config=config)
        #basic preprocessing of the text
        text = text.replace('\t',' ')
        text= text.rstrip()
        text= text.lstrip()
        text = text.replace(' +',' ')
        text = text.replace('\n+','\n')
        text = text.replace('\n+ +',' ')

        #writing data to file
        os.chdir(imgTxt)
        rep=filename[-3:]
        name=filename.replace(rep,'txt')
        with open(name, 'w') as writeFile:
            writeFile.write("%s\n" % text)
        text = text.replace('\n',' ')
        dataFile.append(text)
    print('writing data to file done')    
    return dataFile

@MousamSingh,您不能直接检查图像的方向,因为那是不可能的,因为每当您尝试通过 tesseract 传递图像时,它都会检测到文本并返回给您可能有噪音或不必要文本的字符串结果。

答案 -> 在将图像直接传递给 tesseract 之前,您应该首先尝试检测该图像中的文本,然后将该文本与边框绑定,最终会在文本周围创建矩形,然后裁剪这些文本并传递它tesseract,它会给你更好的结果,因为你关心图像的方向。您应该做的是获取框的坐标并使用这些坐标,您将能够找到角度,并且可以根据需要将该图像旋转到特定角度。

我认为这可能对您有所帮助。如果您找到答案,请投票。谢谢

是的,我忘了给你建议检测文本的方法...

这是 python 的存储库,可用于检测文本。

github link to python code for text detection

如果您还需要什么,请告诉我。谢谢

我得到了检查图像方向的解决方案。我们已经在 pytesseract 中有一个方法来完成这项工作。

imPath='path_to_image'
im = cv2.imread(str(imPath), cv2.IMREAD_COLOR)
newdata=pytesseract.image_to_osd(im)
re.search('(?<=Rotate: )\d+', newdata).group(0)

方法 pytesseract.image_to_osd(im) 的输出是:

Page number: 0
Orientation in degrees: 270
Rotate: 90
Orientation confidence: 4.21
Script: Latin
Script confidence: 1.90

而且我们只需要旋转值来改变方向,所以使用正则表达式将做进一步的剩余工作。

re.search('(?<=Rotate: )\d+', newdata).group(0)

这将是旋转图像以将其置于 0` 方向的最终方法。

def rotate(image, center = None, scale = 1.0):
    angle=360-int(re.search('(?<=Rotate: )\d+', pytesseract.image_to_osd(image)).group(0))
    (h, w) = image.shape[:2]

    if center is None:
        center = (w / 2, h / 2)

    # Perform the rotation
    M = cv2.getRotationMatrix2D(center, angle, scale)
    rotated = cv2.warpAffine(image, M, (w, h))

    return rotated

编辑:更好的方法可能是安装 tesseocr 包,因为它适用于最新的 Tesseract 版本。

康达:conda install -c conda-forge tesserocr

from tesserocr import PyTessBaseAPI, OEM, PSM

def get_angles2(img):
    with PyTessBaseAPI( psm=PSM.OSD_ONLY, lang="osd", oem=OEM.TESSERACT_LSTM_COMBINED ) as api:
        api.SetImage(img)
        os = api.DetectOrientationScript()

    if os['orient_deg'] == 0:
        return 0
    elif os['orient_deg'] > 90:
        return 360-os['orient_deg']
    else:
        return -os['orient_deg']

原版

我的答案是基于计算霍夫变换生成的直线之间的角度,因为没有其他方法适用于我的数据集。这是一种快速的方法,在实践中证明效果很好。

此功能的先决条件是灰度化、二值化和颜色反转。

import cv2

img = cv2.imread('test0.png')
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
img = cv2.bitwise_not(img)

在此之后,您可以 运行 下面的函数并获取检测到的所有直通线的所有角度。请按照 OpenCV documentation: Accumulator threshold parameter. Only those lines are returned that get enough votes ( >threshold ). For more information on calculating the angle on the (x,y) coordinates, refer to this .

中的规定调整阈值参数(当前为 300)
import cv2
import numpy as np

def get_angles(img):
    edges = cv2.Canny(img, 50, 150, apertureSize = 3)
    lines = cv2.HoughLines(edges, 1, np.pi/180, threshold=300)

    angles = []

    for line in lines:
        rho, theta = line[0]
        a = np.cos(theta)
        b = np.sin(theta)
        x0 = a*rho
        y0 = b*rho
        x1 = int(x0 + 1000*(-b))
        y1 = int(y0 + 1000*(a))
        x2 = int(x0 - 1000*(-b))
        y2 = int(y0 - 1000*(a))
        
        radians = np.arctan2(y2-y1, x2-x1)
        degrees = np.degrees(radians)

        angles.append(degrees)

    return angles

运行使用此函数后,您将从霍夫变换得到一长串角度。来自 不应 旋转的图像:

[-90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -0.974421553508672, -0.974421553508672, -0.974421553508672, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.9749091578796124, 0.9749091578796124, 0.9749091578796124, 0.9749091578796124, 1.0030752389838637, 1.0030752389838637, 3.9855957480807316, 3.9875880958503185]

应该 旋转的图像:

[-90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -88.99692476101613, -88.99692476101613, -88.99692476101613, -88.99692476101613, -88.99692476101613, -88.99692476101613, -88.99692476101613, -88.99692476101613, -88.99692476101613, -88.99642282400909, -88.99642282400909, -88.02210297626898, -87.99346106671473, -87.99346106671473, -87.99346106671473, -87.99346106671473, -87.99346106671473, -87.99346106671473, -87.99346106671473, -87.99346106671473, -87.99346106671473, -87.99346106671473, -87.99346106671473, -87.99346106671473, -87.99245711203707, -87.99245711203707, -87.99245711203707, -87.99245711203707, -86.99022425882445, -86.99022425882445, -86.98871912968818, -86.98871912968818, -86.98871912968818, -86.98871912968818, -86.98871912968818, -86.98871912968818, -86.98871912968818, -86.98871912968818, -86.98871912968818, -86.98871912968818, -86.98871912968818, -86.98871912968818, -86.01440425191927, -86.01440425191927, -86.01440425191927, -86.01241190414969, -86.01241190414969, -86.01241190414969, -86.01241190414969, -86.01241190414969, -86.01241190414969, -86.01241190414969, -86.01241190414969, -86.01241190414969, -86.01241190414969, -86.01241190414969, -85.00791883390836, -85.00791883390836, -85.00791883390836, -85.00791883390836, -85.00542418989113, -85.00542418989113, -0.974421553508672, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.9749091578796124, 85.9838177634312, 86.98871912968818, 86.98871912968818, 86.98871912968818, 86.99022425882445, 87.99346106671473, 87.99346106671473, 87.99346106671473, 87.99346106671473, 87.99346106671473, 87.99346106671473, 87.99346106671473, 87.99346106671473, 88.99692476101613, 88.99692476101613, 88.99692476101613, 88.99692476101613, 88.99692476101613, 88.99692476101613, 88.99692476101613, 88.99692476101613]

在这里,我将留给您一些选择旋转角度的选项。选项 3 应该适用于我上面介绍的阵列,但请根据您的情况进行调整:

  1. 使用中间角度旋转图像
  2. 第一个、中间和最后一个角度的平均值
  3. 求前 10 个和后 10 个值的平均值。如果差异太大,图像不需要旋转。但是,如果它们很接近,则可以找到 20 个值(前 10 个和后 10 个)的平均值,并将其用作旋转值。

以下是我测试过但(对我而言)不起作用的指南列表。我相信如果包含财务数据(如方程式或表格),这些软件包中的大多数都无法正常工作。但是,如果您的图片中只有文字,这些指南可能对您有用:

  1. 第一个为需要旋转的图像和不需要旋转的图像提供了 -90 度。 https://becominghuman.ai/how-to-automatically-deskew-straighten-a-text-image-using-opencv-a0c30aed83df
  2. 这在Python 3中给出了很多错误。修复代码后,结果根本不起作用。 https://mzucker.github.io/2016/08/15/page-dewarping.html
  3. 您可以在上面添加 Mousam Singh 的示例。这不起作用,因为 Tesseract 会抛出错误。此外,我不确定 运行 Tesseract 两次是否太明智。
  4. 这个包对我不起作用。这种方法太简单了。 https://github.com/sbrunner/deskew
  5. 值得一提的是我没有尝试过的 Leptonica,它被 Tesseract、OpenCV 和其他主要软件包使用。它需要管理一个我不想处理的依赖项,但是,如果您已经有一些 C 的经验,它可能对您有用。https://tpgit.github.io/Leptonica/skew_8c.html