Tesseract OCR 中的 Blob 是什么
What is Blob in Tesseract OCR
我正在学习 Tesseract OCR 并正在阅读这篇文章 article that is based on this article。来自第一篇文章:
First step is Adaptive Thresholding, which converts the image into
binary images. Next step is connected component analysis which is
used to extract character outlines. This method is very useful
because it does the OCR of image with white text and black background.
Tesseract was probably first to provide this kind of
processing. Then after, the outlines are converted into Blobs.
Blobs are organized into text lines, and the lines and
regions are analyzed for some fixed area or equivalent text
size.
谁能解释一下什么是 Blob?
来自 https://tesseract-ocr.repairfaq.org/tess_glossary.html :
Blob
Isolated, small region of the scanned image. It's delineated by the outline. Tesseract 'juggles' the blobs to see if they can be split further into something that improved the confidence of recognition. Sometimes, blobs are 'combined' if that gives a better result. See pithsync.cpp, for example.
通常,blob(也称为连通分量)是二值图像中的连通部分(即未断开的部分)。换句话说,它是二值图像中的实体元素。
Blob 查找器是任何旨在 extracting/measuring 数字图像数据的系统中的关键步骤。
我正在学习 Tesseract OCR 并正在阅读这篇文章 article that is based on this article。来自第一篇文章:
First step is Adaptive Thresholding, which converts the image into binary images. Next step is connected component analysis which is used to extract character outlines. This method is very useful because it does the OCR of image with white text and black background. Tesseract was probably first to provide this kind of processing. Then after, the outlines are converted into Blobs. Blobs are organized into text lines, and the lines and regions are analyzed for some fixed area or equivalent text size.
谁能解释一下什么是 Blob?
来自 https://tesseract-ocr.repairfaq.org/tess_glossary.html :
Blob
Isolated, small region of the scanned image. It's delineated by the outline. Tesseract 'juggles' the blobs to see if they can be split further into something that improved the confidence of recognition. Sometimes, blobs are 'combined' if that gives a better result. See pithsync.cpp, for example.
通常,blob(也称为连通分量)是二值图像中的连通部分(即未断开的部分)。换句话说,它是二值图像中的实体元素。 Blob 查找器是任何旨在 extracting/measuring 数字图像数据的系统中的关键步骤。