使用 pytesseract 从图像中提取文本时，首先打印数字，然后打印字符串

Question

使用 pytesseract 从图像中提取文本时，先打印数字，然后打印字符串。为什么会这样？

import cv2
import pytesseract
from pytesseract import Output
from PIL import Image
imginput = cv2.imread('ss.png')
x,img1 = cv2.threshold(imginput, 180, 255, cv2.THRESH_BINARY)
img = Image.fromarray(img1)
d = pytesseract.image_to_string(img, output_type=Output.DICT)
print(d)

我的输出：

'text': **'71.\n\n72.\n\n73.\n\n74.\n\n75.\n\n76.\n\n77.\n\n78.\n\n79.\n\n80.**n\nPick out the synonym of the word ‘depositary’ :\n\n(A) inheritor (B) ward (C) patron (D) trustee\nThe fifth chapter comprises three sections.\n(A) of (B) with (C) no preposition (D) on\n\nAntonym of ‘abortive’ is :\n(A) _ successful (B) reproductive (C) instantaneous (D) fruitful\n\nThe one word for a person who doubts in religious practices :\n(A) _ stoic (B) sceptic (C) theist (D) pantheist\n\nThe idiom “bury the hatchet’ means .\n(A) keep enmity (B) open enmity (C) stop enmity (D) have no enmity\n\nVictor seldom visits his uncle, Add proper tag question.\n(A) doesn’t he ? (B) isn’the? (C) ishe? (D) does he ?\n\n‘Khalil Gibran is one of the greatest poets of the world.’ Pick out the comparative degree of\nthe sentence.\n\n(A) Khalil Gibran is greater than many other poets of the world.\n(B) Khalil Gibran is greater than any other poet of the world.\n(C) Khalil Gibran is greater than any other poets of the world.\n(D) Khalil Gibran is the greatest poet of the world.\n\nThe passive form of ‘I keep my books here.’ is :\n(A) My books keep here (B) My books are keeping here\n(C) Iam kept the books here (D) My books are kept here\n\nPick out the correctly spelt word.\n\n(A) Constellation (B) Consistancy\n(C) Conspirecy (D) Conservatary\nWe need two more players to the team. Supply suitable phrasal verb.\n(A) make out (B) make up (C) make for (D) make of\n11 052/2019 - M\n\n{P.T.0}'}

Answer 1

尝试运行其他分割模式：

Page segmentation modes:
  0    Orientation and script detection (OSD) only.
  1    Automatic page segmentation with OSD.
  2    Automatic page segmentation, but no OSD, or OCR. (not implemented)
  3    Fully automatic page segmentation, but no OSD. (Default)
  4    Assume a single column of text of variable sizes.
  5    Assume a single uniform block of vertically aligned text.
  6    Assume a single uniform block of text.
  7    Treat the image as a single text line.
  8    Treat the image as a single word.
  9    Treat the image as a single word in a circle.
 10    Treat the image as a single character.
 11    Sparse text. Find as much text as possible in no particular order.
 12    Sparse text with OSD.
 13    Raw line. Treat the image as a single text line,
       bypassing hacks that are Tesseract-specific.

像这样添加：

# Example of adding any additional options.
custom_oem_psm_config = r'--psm 6'
pytesseract.image_to_string(image, config=custom_oem_psm_config, output_type=Output.DICT)

使用 pytesseract 从图像中提取文本时，首先打印数字，然后打印字符串

While extracting a text from image using pytesseract , numbers are printing first and then the strings are printed

python-tesseract