Python tesseract 的准确性
Python accuracy for tesseract
我有 运行 tesseract ocr 将图像文件转换为字符串。
现在我有输出了
如何比较原始PNG文件和输出文本文件的准确性是否正确
basewidth = 2700
img = Image.open('D:OCR\page1.png')
wpercent = (basewidth/float(img.size[0]))
hsize = int((float(img.size[1])*float(wpercent)))
img = img.resize((basewidth,hsize), PIL.Image.ANTIALIAS)
img.save('page1_zoom.png')
print(image_to_string(Image.open('D:\page1_zoom.png')))
如何检查内容是否准确?
您肯定需要一些 手动基线/黄金数据 来比较结果。您将需要您的测试数据或至少要验证的参数。
Test cases could be something like:
1. Whole textual data
2. No of lines
3. No of Paragraphs
4. Position of text
Tesseract 与 Google ocr:
If you want to test tesseract accuracy with other OCR then you can try
google OCR that gives better results than tesseract (although it is
based on it)
Tesseract 训练:
Tesseract does provide feature of training to improve the accuracy of results.
我有 运行 tesseract ocr 将图像文件转换为字符串。
现在我有输出了
如何比较原始PNG文件和输出文本文件的准确性是否正确
basewidth = 2700
img = Image.open('D:OCR\page1.png')
wpercent = (basewidth/float(img.size[0]))
hsize = int((float(img.size[1])*float(wpercent)))
img = img.resize((basewidth,hsize), PIL.Image.ANTIALIAS)
img.save('page1_zoom.png')
print(image_to_string(Image.open('D:\page1_zoom.png')))
如何检查内容是否准确?
您肯定需要一些 手动基线/黄金数据 来比较结果。您将需要您的测试数据或至少要验证的参数。
Test cases could be something like:
1. Whole textual data
2. No of lines
3. No of Paragraphs
4. Position of text
Tesseract 与 Google ocr:
If you want to test tesseract accuracy with other OCR then you can try google OCR that gives better results than tesseract (although it is based on it)
Tesseract 训练:
Tesseract does provide feature of training to improve the accuracy of results.