Tesseract 的页面分割模式 1 (-- psm 1) 是否与 deskewing 图像具有相同的效果?

Does Tesseract's page segmentation mode 1 (-- psm 1) have the same effect as deskewing images?

Tesseract 提供了一个参数来设置页面分割模式(-- psm)
下面是所有模式,如图documentation:

Page segmentation modes:
  0    Orientation and script detection (OSD) only.
  1    Automatic page segmentation with OSD.
  2    Automatic page segmentation, but no OSD, or OCR.
  3    Fully automatic page segmentation, but no OSD. (Default)
  4    Assume a single column of text of variable sizes.
  5    Assume a single uniform block of vertically aligned text.
  6    Assume a single uniform block of text.
  7    Treat the image as a single text line.
  8    Treat the image as a single word.
  9    Treat the image as a single word in a circle.
 10    Treat the image as a single character.
 11    Sparse text. Find as much text as possible in no particular order.
 12    Sparse text with OSD.
 13    Raw line. Treat the image as a single text line,
                        bypassing hacks that are Tesseract-specific.

-- psm 1是否与对图像进行校正然后使用例如-- psm 3?

根据我的经验,PSM 1 和 PSM 3 对文本进行了 90、180、270 度的校正,尽管上面写着 PSM 1 只会这样做。但是,您可能会注意到在某些情况下 PSM 1 可能会在分割过程中遗漏一些文本。如果文本未按行对齐并且字体大小不同,则可能会发生这种情况。但一般来说,不要期望 Tesseract 检测到不是 0、90、180 或 270 度的文本方向。此外,您需要的字符数通常为 >= 50 个字符。 :)