文字识别与重构 OCR opencv
text recognition and restructuring OCR opencv
Link 到原图
https://ibb.co/0VC6vkX
我目前正在处理一个 OCR 项目。我对图像进行了预处理,然后应用预训练的 EAST 模型进行文本检测。
import cv2
import numpy as np
from imutils.object_detection import non_max_suppression
import matplotlib.pyplot as plt
%matplotlib inline
img=cv2.imread('bw_image.jpg')
model=cv2.dnn.readNet('frozen_east_text_detection.pb')
#Prepare the Image
#use multiple of 32 to set the new image shape
height,width,colorch=img.shape
new_height=(height//32)*32
new_width=(width//32)*32
print(new_height,new_width)
h_ratio=height/new_height
w_ratio=width/new_width
print(h_ratio,w_ratio)
#blob from image helps us to prepare the image
blob=cv2.dnn.blobFromImage(img,1,(new_width,new_height),(123.68,116.78,103.94),True, False)
model.setInput(blob)
#this model outputs geometry and score maps
(geometry,scores)=model.forward(model.getUnconnectedOutLayersNames())
#once we have done geometry and score maps we have to do post processing to obtain the final text boxes
rectangles=[]
confidence_score=[]
for i in range(geometry.shape[2]):
for j in range(0,geometry.shape[3]):
if scores[0][0][i][j]<0.1:
continue
bottom_x=int(j*4 + geometry[0][1][i][j])
bottom_y=int(i*4 + geometry[0][2][i][j])
top_x=int(j*4 - geometry[0][3][i][j])
top_y=int(i*4 - geometry[0][0][i][j])
rectangles.append((top_x,top_y,bottom_x,bottom_y))
confidence_score.append(float(scores[0][0][i][j]))
#use nms to get required triangles
final_boxes=non_max_suppression(np.array(rectangles),probs=confidence_score,overlapThresh=0.5)
#finally to display these text boxes let's iterate over them and convert them to the original shape
#using the ratio we calculated earlier
img_copy=img.copy()
for (x1,y1,x2,y2) in final_boxes:
x1=int(x1*w_ratio)
y1=int(y1*h_ratio)
x2=int(x2*w_ratio)
y2=int(y2*h_ratio)
#to draw the rectangles on the image use cv2.rectangle function
cv2.rectangle(img_copy,(x1,y1),(x2,y2),(0,255,0),2)
这为我们提供了检测到的文本,如下所示:
现在对于文本识别,我使用预训练的 opencv CRNN 模型如下:
# Download the CRNN model and Load it
model1 = cv2.dnn.readNet('D:/downloads/crnn.onnx')
# ## Prepare the image
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blob = cv2.dnn.blobFromImage(img_gray, scalefactor=1/127.5, size=(100,32), mean=127.5)
# Pass the image to network and extract per-timestep scores
model1.setInput(blob)
scores = model1.forward()
print(scores.shape)
alphabet_set = "0123456789abcdefghijklmnopqrstuvwxyz"
blank = '-'
char_set = blank + alphabet_set
# Decode the scores to text
def most_likely(scores, char_set):
text = ""
for i in range(scores.shape[0]):
c = np.argmax(scores[i][0])
text += char_set[c]
return text
def map_rule(text):
char_list = []
for i in range(len(text)):
if i == 0:
if text[i] != '-':
char_list.append(text[i])
else:
if text[i] != '-' and (not (text[i] == text[i - 1])):
char_list.append(text[i])
return ''.join(char_list)
def best_path(scores, char_set):
text = most_likely(scores, char_set)
final_text = map_rule(text)
return final_text
out = best_path(scores, char_set)
print(out)
但是在图像上应用这个模型会得到以下输出:
saetan
我真的不明白。任何人都可以指导文本识别的问题是什么。预训练的CRNN模型有问题吗?此外,我还想在文本被识别后重组文本,它们是在原始图像中构建的。识别问题解决后,我们有了边界框坐标和识别的文本,那么我们如何才能准确地重构文本呢?任何帮助将不胜感激。
编辑:我使用了 pytesseract image_to_string()
和 image_to_data()
函数,但它们的性能不佳。如果这个 CRNN 模型不够合适,是否有任何其他我可以使用的预训练文本识别模型,以便我可以复制我的 EAST Text Detection
模型的成功。这样我就可以在通过 EAST 模型获得的 coordinates(bounding boxes)
的帮助下准确地重组我的文本,因为它在图像中。
这是一个可能的解决方案,您可以通过尝试一些事情来改进它:
- 通过改变高斯参数
- 通过对模糊图像进行阈值化以查看它是否改善了结果
代码:
gray = cv2.imread('/path/to/your_image.jpeg', cv2.IMREAD_GRAYSCALE)
g = cv2.GaussianBlur(gray, (3, 3), .5)
config = "-l eng --oem 1 --psm 6"
text = pytesseract.image_to_string(g, config=config)
print(text)
结果文本(部分):
400242 | 6161108006012 BIO WHOLE MILK 1LTR \ 1PCS 12.Cu PCS 430.50 1,566.00
400365 | 6161108000119 BIO YOG VANILLA 150ML CUP ! 1PCS 24.05 PCS 91.02 2,184.36
400545 | 6161108000584 BIO LONG LIFE COOKING CREAM SOOML 1 1PCS \ 12.Gu PCS 241.32 2,895.78
74 - i :
400821 | 6161108005060" | BIO YOGHURT STRAWBERRY 450ML | 1Pcs 6.50 PCS 266.37 1,598.23
400822 , 6161108005207 BIO YOGHURT VANILLA 90ML ; 1PCS ! 36.0b FCS 60.96 2,194.38
450466 | 6166000051801 KENTASTE COCONUT MILK 400ML ; 1CTN * 12 PCS | 2.00 TN 1,920.96 3,841.92
, 450469 | 6166000051818 KENTASTE COCONUT CREAM 400ML : 1CTN* 12 PCS | 2.0. CTN 2,213.28 4,426.56
450465 | 6166000051887 KENTASTE COCONUT OIL 700ML 1CTN * 12 PCS | Iso) STN) 7,697.76 7,697.76
400985 | 6161108000812 BIO WHOLE MILK LONG LIFE SOOML EPCS: 12.00 PCS | 67.40 808.79
经过仔细检查,我发现你的代码有不少问题。如果我了解您想要如何 运行 crnn 识别器,那么以下解决方案可能会解决您的问题。
您的函数定义:
# Decode the scores to text
def most_likely(scores, char_set):
text = ""
for i in range(scores.shape[0]):
c = np.argmax(scores[i][0])
text += char_set[c]
return text
def map_rule(text):
char_list = []
for i in range(len(text)):
if i == 0:
if text[i] != '-':
char_list.append(text[i])
else:
if text[i] != '-' and (not (text[i] == text[i - 1])):
char_list.append(text[i])
return ''.join(char_list)
def best_path(scores, char_set):
text = most_likely(scores, char_set)
final_text = map_rule(text)
return final_text
以下大部分代码是您已有的,比较变化:
import cv2
import numpy as np
from imutils.object_detection import non_max_suppression
import matplotlib.pyplot as plt
%matplotlib inline
img=cv2.imread('/path/to/your_imge.jpeg')
model=cv2.dnn.readNet('/path/to/frozen_east_text_detection.pb')
#Prepare the Image
#use multiple of 32 to set the new image shape
height,width,colorch=img.shape
new_height=(height//32)*32
new_width=(width//32)*32
print(new_height,new_width)
h_ratio=height/new_height
w_ratio=width/new_width
print(h_ratio,w_ratio)
#blob from image helps us to prepare the image
blob=cv2.dnn.blobFromImage(img,1,(new_width,new_height),(123.68,116.78,103.94),True, False)
model.setInput(blob)
#this model outputs geometry and score maps
(geometry,scores)=model.forward(model.getUnconnectedOutLayersNames())
#once we have done geometry and score maps we have to do post processing to obtain the final text boxes
rectangles=[]
confidence_score=[]
for i in range(geometry.shape[2]):
for j in range(0,geometry.shape[3]):
if scores[0][0][i][j]<0.1:
continue
bottom_x=int(j*4 + geometry[0][1][i][j])
bottom_y=int(i*4 + geometry[0][2][i][j])
top_x=int(j*4 - geometry[0][3][i][j])
top_y=int(i*4 - geometry[0][0][i][j])
rectangles.append((top_x,top_y,bottom_x,bottom_y))
confidence_score.append(float(scores[0][0][i][j]))
#use nms to get required triangles
final_boxes=non_max_suppression(np.array(rectangles),probs=confidence_score,overlapThresh=0.5)
model1 = cv2.dnn.readNet('/path/to/crnn.onnx')
# ## Prepare the image
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
alphabet_set = "0123456789abcdefghijklmnopqrstuvwxyz"
blank = '-'
char_set = blank + alphabet_set
for (x1,y1,x2,y2) in final_boxes:
x1=int(x1*w_ratio)
y1=int(y1*h_ratio)
x2=int(x2*w_ratio)
y2=int(y2*h_ratio)
# Work with detected text boxes for recognition
blob = cv2.dnn.blobFromImage(img_gray[y1:y2, x1:x2], scalefactor=1/127.5, size=(100,32), mean=127.5)
# Pass the image to network and extract per-timestep scores
model1.setInput(blob)
scores = model1.forward()
print(scores.shape)
out = best_path(scores, char_set)
print(out)
您可能需要将已识别的文本叠加在图像上方左右以检查准确性。有更好的方法来进行测试,但这超出了这个问题的范围。
处理作物非常简单,只需稍微改变一下你的最后一个循环:
import pytesseract
from PIL import Image
...
for x1,y1,x2,y2 in final_boxes:
#to draw the rectangles on the image use cv2.rectangle function
# cv2.rectangle(img_copy,(x1,y1),(x2,y2),(0,255,0),2)
img_crop = Image.fromarray(img[y1-1: y2+1, x1-1:x2+1])
text = pytesseract.image_to_string(img_crop, config='--psm 8').strip()
cv2.putText(img_copy, text, (x1,y1), 0, .7, (0, 0, 255), 2 )
Link 到原图 https://ibb.co/0VC6vkX
我目前正在处理一个 OCR 项目。我对图像进行了预处理,然后应用预训练的 EAST 模型进行文本检测。
import cv2
import numpy as np
from imutils.object_detection import non_max_suppression
import matplotlib.pyplot as plt
%matplotlib inline
img=cv2.imread('bw_image.jpg')
model=cv2.dnn.readNet('frozen_east_text_detection.pb')
#Prepare the Image
#use multiple of 32 to set the new image shape
height,width,colorch=img.shape
new_height=(height//32)*32
new_width=(width//32)*32
print(new_height,new_width)
h_ratio=height/new_height
w_ratio=width/new_width
print(h_ratio,w_ratio)
#blob from image helps us to prepare the image
blob=cv2.dnn.blobFromImage(img,1,(new_width,new_height),(123.68,116.78,103.94),True, False)
model.setInput(blob)
#this model outputs geometry and score maps
(geometry,scores)=model.forward(model.getUnconnectedOutLayersNames())
#once we have done geometry and score maps we have to do post processing to obtain the final text boxes
rectangles=[]
confidence_score=[]
for i in range(geometry.shape[2]):
for j in range(0,geometry.shape[3]):
if scores[0][0][i][j]<0.1:
continue
bottom_x=int(j*4 + geometry[0][1][i][j])
bottom_y=int(i*4 + geometry[0][2][i][j])
top_x=int(j*4 - geometry[0][3][i][j])
top_y=int(i*4 - geometry[0][0][i][j])
rectangles.append((top_x,top_y,bottom_x,bottom_y))
confidence_score.append(float(scores[0][0][i][j]))
#use nms to get required triangles
final_boxes=non_max_suppression(np.array(rectangles),probs=confidence_score,overlapThresh=0.5)
#finally to display these text boxes let's iterate over them and convert them to the original shape
#using the ratio we calculated earlier
img_copy=img.copy()
for (x1,y1,x2,y2) in final_boxes:
x1=int(x1*w_ratio)
y1=int(y1*h_ratio)
x2=int(x2*w_ratio)
y2=int(y2*h_ratio)
#to draw the rectangles on the image use cv2.rectangle function
cv2.rectangle(img_copy,(x1,y1),(x2,y2),(0,255,0),2)
这为我们提供了检测到的文本,如下所示:
现在对于文本识别,我使用预训练的 opencv CRNN 模型如下:
# Download the CRNN model and Load it
model1 = cv2.dnn.readNet('D:/downloads/crnn.onnx')
# ## Prepare the image
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blob = cv2.dnn.blobFromImage(img_gray, scalefactor=1/127.5, size=(100,32), mean=127.5)
# Pass the image to network and extract per-timestep scores
model1.setInput(blob)
scores = model1.forward()
print(scores.shape)
alphabet_set = "0123456789abcdefghijklmnopqrstuvwxyz"
blank = '-'
char_set = blank + alphabet_set
# Decode the scores to text
def most_likely(scores, char_set):
text = ""
for i in range(scores.shape[0]):
c = np.argmax(scores[i][0])
text += char_set[c]
return text
def map_rule(text):
char_list = []
for i in range(len(text)):
if i == 0:
if text[i] != '-':
char_list.append(text[i])
else:
if text[i] != '-' and (not (text[i] == text[i - 1])):
char_list.append(text[i])
return ''.join(char_list)
def best_path(scores, char_set):
text = most_likely(scores, char_set)
final_text = map_rule(text)
return final_text
out = best_path(scores, char_set)
print(out)
但是在图像上应用这个模型会得到以下输出:
saetan
我真的不明白。任何人都可以指导文本识别的问题是什么。预训练的CRNN模型有问题吗?此外,我还想在文本被识别后重组文本,它们是在原始图像中构建的。识别问题解决后,我们有了边界框坐标和识别的文本,那么我们如何才能准确地重构文本呢?任何帮助将不胜感激。
编辑:我使用了 pytesseract image_to_string()
和 image_to_data()
函数,但它们的性能不佳。如果这个 CRNN 模型不够合适,是否有任何其他我可以使用的预训练文本识别模型,以便我可以复制我的 EAST Text Detection
模型的成功。这样我就可以在通过 EAST 模型获得的 coordinates(bounding boxes)
的帮助下准确地重组我的文本,因为它在图像中。
这是一个可能的解决方案,您可以通过尝试一些事情来改进它:
- 通过改变高斯参数
- 通过对模糊图像进行阈值化以查看它是否改善了结果
代码:
gray = cv2.imread('/path/to/your_image.jpeg', cv2.IMREAD_GRAYSCALE)
g = cv2.GaussianBlur(gray, (3, 3), .5)
config = "-l eng --oem 1 --psm 6"
text = pytesseract.image_to_string(g, config=config)
print(text)
结果文本(部分):
400242 | 6161108006012 BIO WHOLE MILK 1LTR \ 1PCS 12.Cu PCS 430.50 1,566.00
400365 | 6161108000119 BIO YOG VANILLA 150ML CUP ! 1PCS 24.05 PCS 91.02 2,184.36
400545 | 6161108000584 BIO LONG LIFE COOKING CREAM SOOML 1 1PCS \ 12.Gu PCS 241.32 2,895.78
74 - i :
400821 | 6161108005060" | BIO YOGHURT STRAWBERRY 450ML | 1Pcs 6.50 PCS 266.37 1,598.23
400822 , 6161108005207 BIO YOGHURT VANILLA 90ML ; 1PCS ! 36.0b FCS 60.96 2,194.38
450466 | 6166000051801 KENTASTE COCONUT MILK 400ML ; 1CTN * 12 PCS | 2.00 TN 1,920.96 3,841.92
, 450469 | 6166000051818 KENTASTE COCONUT CREAM 400ML : 1CTN* 12 PCS | 2.0. CTN 2,213.28 4,426.56
450465 | 6166000051887 KENTASTE COCONUT OIL 700ML 1CTN * 12 PCS | Iso) STN) 7,697.76 7,697.76
400985 | 6161108000812 BIO WHOLE MILK LONG LIFE SOOML EPCS: 12.00 PCS | 67.40 808.79
经过仔细检查,我发现你的代码有不少问题。如果我了解您想要如何 运行 crnn 识别器,那么以下解决方案可能会解决您的问题。
您的函数定义:
# Decode the scores to text
def most_likely(scores, char_set):
text = ""
for i in range(scores.shape[0]):
c = np.argmax(scores[i][0])
text += char_set[c]
return text
def map_rule(text):
char_list = []
for i in range(len(text)):
if i == 0:
if text[i] != '-':
char_list.append(text[i])
else:
if text[i] != '-' and (not (text[i] == text[i - 1])):
char_list.append(text[i])
return ''.join(char_list)
def best_path(scores, char_set):
text = most_likely(scores, char_set)
final_text = map_rule(text)
return final_text
以下大部分代码是您已有的,比较变化:
import cv2
import numpy as np
from imutils.object_detection import non_max_suppression
import matplotlib.pyplot as plt
%matplotlib inline
img=cv2.imread('/path/to/your_imge.jpeg')
model=cv2.dnn.readNet('/path/to/frozen_east_text_detection.pb')
#Prepare the Image
#use multiple of 32 to set the new image shape
height,width,colorch=img.shape
new_height=(height//32)*32
new_width=(width//32)*32
print(new_height,new_width)
h_ratio=height/new_height
w_ratio=width/new_width
print(h_ratio,w_ratio)
#blob from image helps us to prepare the image
blob=cv2.dnn.blobFromImage(img,1,(new_width,new_height),(123.68,116.78,103.94),True, False)
model.setInput(blob)
#this model outputs geometry and score maps
(geometry,scores)=model.forward(model.getUnconnectedOutLayersNames())
#once we have done geometry and score maps we have to do post processing to obtain the final text boxes
rectangles=[]
confidence_score=[]
for i in range(geometry.shape[2]):
for j in range(0,geometry.shape[3]):
if scores[0][0][i][j]<0.1:
continue
bottom_x=int(j*4 + geometry[0][1][i][j])
bottom_y=int(i*4 + geometry[0][2][i][j])
top_x=int(j*4 - geometry[0][3][i][j])
top_y=int(i*4 - geometry[0][0][i][j])
rectangles.append((top_x,top_y,bottom_x,bottom_y))
confidence_score.append(float(scores[0][0][i][j]))
#use nms to get required triangles
final_boxes=non_max_suppression(np.array(rectangles),probs=confidence_score,overlapThresh=0.5)
model1 = cv2.dnn.readNet('/path/to/crnn.onnx')
# ## Prepare the image
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
alphabet_set = "0123456789abcdefghijklmnopqrstuvwxyz"
blank = '-'
char_set = blank + alphabet_set
for (x1,y1,x2,y2) in final_boxes:
x1=int(x1*w_ratio)
y1=int(y1*h_ratio)
x2=int(x2*w_ratio)
y2=int(y2*h_ratio)
# Work with detected text boxes for recognition
blob = cv2.dnn.blobFromImage(img_gray[y1:y2, x1:x2], scalefactor=1/127.5, size=(100,32), mean=127.5)
# Pass the image to network and extract per-timestep scores
model1.setInput(blob)
scores = model1.forward()
print(scores.shape)
out = best_path(scores, char_set)
print(out)
您可能需要将已识别的文本叠加在图像上方左右以检查准确性。有更好的方法来进行测试,但这超出了这个问题的范围。
处理作物非常简单,只需稍微改变一下你的最后一个循环:
import pytesseract
from PIL import Image
...
for x1,y1,x2,y2 in final_boxes:
#to draw the rectangles on the image use cv2.rectangle function
# cv2.rectangle(img_copy,(x1,y1),(x2,y2),(0,255,0),2)
img_crop = Image.fromarray(img[y1-1: y2+1, x1-1:x2+1])
text = pytesseract.image_to_string(img_crop, config='--psm 8').strip()
cv2.putText(img_copy, text, (x1,y1), 0, .7, (0, 0, 255), 2 )