如何让Tesseract OCR输出句子形式的单词?
How to make Tesseract OCR output words in a sentence form?
我得到的结果是这样的:
http://i.stack.imgur.com/dM0qG.png
是否可以让 Tesseract 以这样的 sentence/paragraph 形式输出?
This is to certify that you have successfully PASSED the PHIL-IT General Certification Examination held on January 26, 2015 at the Cebu Institute of Technology - University, N. Bacalso Avenue, Cebu City 6000 Philippines.
由于 result
是 Tessnet2.Word
的 List
,并且每个 Word
的文本存储在其 item.Text
中,您可以:
- 创建一个只有单词的列表(不是完整的
Tessnet2.Word
对象)
- 加入此列表,使用 "space" 作为分隔符
假设您的结果存储在名为 result
的变量中(您执行了操作
var result = ocr.DoOCR(image, null);
)。如果结合这两个步骤,它看起来像这样:
string phrase = string.Join(" ", result.Select(x => x.Text).ToList());
结果是:
This is to certify that you have successfully PASSED the Phil-lT
General Certification Examination held on [nnuag 26, 2015 at the Cebu
Institute uf Tedmnlngy · University , N. Bacalso Avenue, Cebu City
6000 Philippines.
(它有一些检测错误,但那是另一个问题)
我得到的结果是这样的: http://i.stack.imgur.com/dM0qG.png
是否可以让 Tesseract 以这样的 sentence/paragraph 形式输出?
This is to certify that you have successfully PASSED the PHIL-IT General Certification Examination held on January 26, 2015 at the Cebu Institute of Technology - University, N. Bacalso Avenue, Cebu City 6000 Philippines.
由于 result
是 Tessnet2.Word
的 List
,并且每个 Word
的文本存储在其 item.Text
中,您可以:
- 创建一个只有单词的列表(不是完整的
Tessnet2.Word
对象) - 加入此列表,使用 "space" 作为分隔符
假设您的结果存储在名为 result
的变量中(您执行了操作
var result = ocr.DoOCR(image, null);
)。如果结合这两个步骤,它看起来像这样:
string phrase = string.Join(" ", result.Select(x => x.Text).ToList());
结果是:
This is to certify that you have successfully PASSED the Phil-lT General Certification Examination held on [nnuag 26, 2015 at the Cebu Institute uf Tedmnlngy · University , N. Bacalso Avenue, Cebu City 6000 Philippines.
(它有一些检测错误,但那是另一个问题)