如何使用 OpenCV 和 Tesseract 正确检测图像中的单词

Question

我正在开发一个应用程序，该应用程序使用 OpenCV 读取图像文件并使用 Tesseract 处理其中的文字。使用以下代码，Tesseract 检测到不包含文本的额外矩形。

void Application::Application::OpenAndProcessImageFile(void)
{
    OPENFILENAMEA ofn;
    ZeroMemory(&ofn, sizeof(OPENFILENAMEA));

    char szFile[260] = { 0 };
    // Initialize remaining fields of OPENFILENAMEA structure
    ofn.lStructSize     = sizeof(ofn);
    ofn.hwndOwner       = mWindow->getHandle();
    ofn.lpstrFile       = szFile;
    ofn.nMaxFile        = sizeof(szFile);
    ofn.lpstrFilter     = "JPG[=10=]*.JPG[=10=]PNG[=10=]*.PNG[=10=]";
    ofn.nFilterIndex    = 1;
    ofn.lpstrFileTitle  = NULL;
    ofn.nMaxFileTitle   = 0;
    ofn.lpstrInitialDir = NULL;
    ofn.Flags           = OFN_PATHMUSTEXIST | OFN_FILEMUSTEXIST;

    //open the picture dialog and select the image
    if (GetOpenFileNameA(&ofn) == TRUE) {
        std::string filePath = ofn.lpstrFile;
        
        //load image
        mImage = cv::imread(filePath.c_str());

        //process image     
        tesseract::TessBaseAPI ocr = tesseract::TessBaseAPI();

        ocr.Init(NULL, "eng");
        ocr.SetImage(mImage.data, mImage.cols, mImage.rows, 3, mImage.step);

        Boxa* bounds = ocr.GetWords(NULL);
        for (int i = 0; i < bounds->n; ++i) {
            Box* b = bounds->box[i];
            cv::rectangle(mImage, { b->x,b->y,b->w,b->h }, { 0, 255, 0 }, 2);
        }

        ocr.End();
        
        //show image
        cv::destroyAllWindows();
        cv::imshow("İşlenmiş Resim", mImage);
    }
}

这是输出图像

如您所见，Tesseract 处理根本不包含单词的区域。我该如何解决这个问题？

Answer 1

Tesseract 基于字符识别而非文本检测。即使在某些区域没有文字，tesseract 也可以将某些功能视为文字。

你需要做的是先使用文本检测算法检测文本区域，然后应用tesseract。 Here 是一个用于文本检测的 dnn 模型的教程，非常好。

我很快将你的图像应用到此，这是输出：

您可以通过更改模型的输入参数来获得更好的结果。我只是使用默认的。

如何使用 OpenCV 和 Tesseract 正确检测图像中的单词

how to detect words in an image with OpenCV and Tesseract properly

c++

opencv

tesseract