如何让Tesseract OCR输出句子形式的单词?

How to make Tesseract OCR output words in a sentence form?

我得到的结果是这样的: http://i.stack.imgur.com/dM0qG.png

是否可以让 Tesseract 以这样的 sentence/paragraph 形式输出?

This is to certify that you have successfully PASSED the PHIL-IT General Certification Examination held on January 26, 2015 at the Cebu Institute of Technology - University, N. Bacalso Avenue, Cebu City 6000 Philippines.

由于 resultTessnet2.WordList,并且每个 Word 的文本存储在其 item.Text 中,您可以:

  1. 创建一个只有单词的列表(不是完整的 Tessnet2.Word 对象)
  2. 加入此列表,使用 "space" 作为分隔符

假设您的结果存储在名为 result 的变量中(您执行了操作 var result = ocr.DoOCR(image, null);)。如果结合这两个步骤,它看起来像这样:

string phrase = string.Join(" ", result.Select(x => x.Text).ToList());

结果是:

This is to certify that you have successfully PASSED the Phil-lT General Certification Examination held on [nnuag 26, 2015 at the Cebu Institute uf Tedmnlngy · University , N. Bacalso Avenue, Cebu City 6000 Philippines.

(它有一些检测错误,但那是另一个问题)