Python3图片文件字典中存在Key时出现Key错误
Python3 Key error when Key exists in dictionary of image files
使用python和os为目录中的文件创建键值字典,并使用张量流预处理图像和extract/print文本。
最终目标:创建一个 For 循环,获取目录中的每个图像,将文件名作为字符串附加到 grocery_cve_project
中的路径,处理每个图像,并提取要读取的文本在控制台中
import os
print('os imported')
# import packages
from PIL import Image
import pytesseract
import cv2
print('packages imported')
### Part 1: store image names in dictionary
dir_name = ".\grocery_cve_project"
# This is where we get our array
# of file names and store in results
result = os.listdir(dir_name)
key_index_store = {}
for i, e in enumerate(result):
key_index_store[i] = e
#print(i, e)
#print("Our key value store is: ")
#print(key_index_store)
# The types of file names we care about.
photo_extensions = [".jpg", ".png"]
# declare the tesseract executable path
pytesseract.pytesseract.tesseract_cmd = 'C:\Program Files\Tesseract-OCR\tesseract.exe'
第 2 部分:图像处理
for e in key_index_store[e]:
image_to_ocr = cv2.imread('grocery_cve_project_\%s' % 'e')
print(image_to_ocr)
# convert to gray
preprocessed_img = cv2.cvtColor(image_to_ocr, cv2.COLOR_BGR2GRAY)
# step 2: do binary and Otsu thresholding
preprocessed_img = cv2.threshold(preprocessed_img, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
# step 3: Median Blur to remove noise in image
preprocessed_img = cv2.medianBlur(preprocessed_img, 3)
'''Step 4: SAVE AND LOAD IMAGE AS PIL image'''
# step 1: Save the processed image to convert to PIL image
for i in key_index_store[i]:
cv2.imwrite(("tempdir\temp_img_%s.jpg" % 'i'), preprocessed_img)
# step 2: load the image as a PIL/Pillow image
preprocessed__pil_img = Image.open('temp_img.jpg')
# step 1: do OCR of image using Tesseract
text_extracted = pytesseract.image_to_string(preprocessed__pil_img)
#Step 2: print the text
print(text_extracted)
(Grocery_env) D:\Documents\Python\Multiple file array>"1. grocery tesseract.py"
os imported
packages imported
Traceback (most recent call last):
File "D:\Documents\Python\Multiple file array. grocery tesseract.py", line 44, in <module>
for e in key_index_store[e]:
KeyError: 'file_99.png'
研究表明当词典 does not exist 中的项目出现时会出现此错误。但是,如果我 运行 代码在第 21 行 print(i, e)
中被注释掉,它会为目录中的所有文件输出 key/value 对,并且 'file_99' 确实存在于索引处236,并且实际位于给定目录中。
图像文件的目录与源代码位于同一文件夹中。
在第一部分中,您使用数字索引填充字典
key_index_store = {}
for i, e in enumerate(result):
key_index_store[i] = e
这有点多余,因为您的结果已经按数字编入索引。
然后,在第二部分你遍历 key_index_store[e]
它最有可能是一个错误,只需删除 [e]
如果我正确理解了您的代码,我想您可能对如何从字典中提取 key/value 对感到有点困惑。但在这种情况下,甚至不需要 dict。
你可以在一个循环中写完这些:
for idx, filename in enumerate(result):
image_to_ocr = cv2.imread(os.path.join(dir_name, filename))
# ... your image processing code ...
out_filename = os.path.join("tempdir", f"temp_img_{idx}.jpg")
cv2.imwrite(out_filename, preprocessed_img)
preprocessed_pil_img = Image.open(out_filename)
# ... the rest ...
使用python和os为目录中的文件创建键值字典,并使用张量流预处理图像和extract/print文本。
最终目标:创建一个 For 循环,获取目录中的每个图像,将文件名作为字符串附加到
grocery_cve_project
中的路径,处理每个图像,并提取要读取的文本在控制台中
import os
print('os imported')
# import packages
from PIL import Image
import pytesseract
import cv2
print('packages imported')
### Part 1: store image names in dictionary
dir_name = ".\grocery_cve_project"
# This is where we get our array
# of file names and store in results
result = os.listdir(dir_name)
key_index_store = {}
for i, e in enumerate(result):
key_index_store[i] = e
#print(i, e)
#print("Our key value store is: ")
#print(key_index_store)
# The types of file names we care about.
photo_extensions = [".jpg", ".png"]
# declare the tesseract executable path
pytesseract.pytesseract.tesseract_cmd = 'C:\Program Files\Tesseract-OCR\tesseract.exe'
第 2 部分:图像处理
for e in key_index_store[e]:
image_to_ocr = cv2.imread('grocery_cve_project_\%s' % 'e')
print(image_to_ocr)
# convert to gray
preprocessed_img = cv2.cvtColor(image_to_ocr, cv2.COLOR_BGR2GRAY)
# step 2: do binary and Otsu thresholding
preprocessed_img = cv2.threshold(preprocessed_img, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
# step 3: Median Blur to remove noise in image
preprocessed_img = cv2.medianBlur(preprocessed_img, 3)
'''Step 4: SAVE AND LOAD IMAGE AS PIL image'''
# step 1: Save the processed image to convert to PIL image
for i in key_index_store[i]:
cv2.imwrite(("tempdir\temp_img_%s.jpg" % 'i'), preprocessed_img)
# step 2: load the image as a PIL/Pillow image
preprocessed__pil_img = Image.open('temp_img.jpg')
# step 1: do OCR of image using Tesseract
text_extracted = pytesseract.image_to_string(preprocessed__pil_img)
#Step 2: print the text
print(text_extracted)
(Grocery_env) D:\Documents\Python\Multiple file array>"1. grocery tesseract.py"
os imported
packages imported
Traceback (most recent call last):
File "D:\Documents\Python\Multiple file array. grocery tesseract.py", line 44, in <module>
for e in key_index_store[e]:
KeyError: 'file_99.png'
研究表明当词典 does not exist 中的项目出现时会出现此错误。但是,如果我 运行 代码在第 21 行
print(i, e)
中被注释掉,它会为目录中的所有文件输出 key/value 对,并且 'file_99' 确实存在于索引处236,并且实际位于给定目录中。图像文件的目录与源代码位于同一文件夹中。
在第一部分中,您使用数字索引填充字典
key_index_store = {}
for i, e in enumerate(result):
key_index_store[i] = e
这有点多余,因为您的结果已经按数字编入索引。
然后,在第二部分你遍历 key_index_store[e]
它最有可能是一个错误,只需删除 [e]
如果我正确理解了您的代码,我想您可能对如何从字典中提取 key/value 对感到有点困惑。但在这种情况下,甚至不需要 dict。
你可以在一个循环中写完这些:
for idx, filename in enumerate(result):
image_to_ocr = cv2.imread(os.path.join(dir_name, filename))
# ... your image processing code ...
out_filename = os.path.join("tempdir", f"temp_img_{idx}.jpg")
cv2.imwrite(out_filename, preprocessed_img)
preprocessed_pil_img = Image.open(out_filename)
# ... the rest ...