我应该裁剪还是填充 Tesseract OCR 学习框

Should I crop or pad Tesseract OCR learning boxes

我目前正在教 Tesseract v3.02 识别英国驾照卡。我正在使用 QT Box Editor 生成 .box 文件，因此我可以 "train" tesseract 来识别这些文档的字体和布局。所以我遇到了一个问题：我是否仔细裁剪每个字母，或者最好给它例如一个 1px 的全方位填充？

所以答案是"whatever seems to make the engine recognise text the best"。您只有在编译 *.traineddata 文件并对其进行测试后才会知道。