改进使用 mnist 数据集训练的神经网络的真实结果
Improve real-life results of neural network trained with mnist dataset
我已经使用 mnist 数据集用 keras 构建了一个神经网络,现在我正尝试将它用于实际手写数字的照片。当然我并不期望结果是完美的,但我目前得到的结果还有很大的改进空间。
对于初学者,我使用一些以我最清晰的笔迹书写的单个数字的照片对其进行测试。它们是正方形的,并且与 mnist 数据集中的图像具有相同的尺寸和颜色。它们保存在名为 individual_test 的文件夹中,例如:7(2)_digit.jpg.
网络通常非常确定错误的结果,我会给你举个例子:
这张图片我得到的结果如下:
result: 3 . probabilities: [1.9963557196245318e-10, 7.241294497362105e-07, 0.02658148668706417, 0.9726449251174927, 2.5416460047722467e-08, 2.6078915027483163e-08, 0.00019745019380934536, 4.8302300825753264e-08, 0.0005754049634560943, 2.8358477788259506e-09]
所以网络有 97% 的把握确定这是一个 3,而这张照片并不是唯一的例子。在 38 张图片中,只有 16 张被正确识别。令我震惊的是,网络对它的结果如此确定,尽管它与正确结果相去甚远。
编辑
将阈值添加到 prepare_image (img = cv2.threshold(img, 0.1, 1, cv2.THRESH_BINARY_INV)[1]
) 后,性能略有提高。现在 38 张图片中有 19 张是正确的,但对于一些图片,包括上面显示的图片,它仍然非常确定错误的结果。这是我现在得到的:
result: 3 . probabilities: [1.0909866760000497e-11, 1.1584616004256532e-06, 0.27739930152893066, 0.7221096158027649, 1.900260038212309e-08, 6.555900711191498e-08, 4.479645940591581e-05, 6.455550760620099e-07, 0.0004443934594746679, 1.0013242457418414e-09]
所以现在只有 72% 确定它的结果哪个更好但仍然...
我可以做些什么来提高性能?我可以更好地准备图像吗?或者我应该将自己的图像添加到训练数据中吗?如果是这样,我会怎么做?
编辑
上面显示的图片在应用 prepare_image 后的样子:
使用阈值后,这是同一张图片的样子:
对比一下:这是mnist数据集提供的其中一张图片:
他们看起来和我很相似。我该如何改进呢?
这是我的代码(包括阈值):
# import keras and the MNIST dataset
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from keras.utils import np_utils
# numpy is necessary since keras uses numpy arrays
import numpy as np
# imports for pictures
import matplotlib.pyplot as plt
import PIL
import cv2
# imports for tests
import random
import os
class mnist_network():
def __init__(self):
""" load data, create and train model """
# load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# flatten 28*28 images to a 784 vector for each image
num_pixels = X_train.shape[1] * X_train.shape[2]
X_train = X_train.reshape((X_train.shape[0], num_pixels)).astype('float32')
X_test = X_test.reshape((X_test.shape[0], num_pixels)).astype('float32')
# normalize inputs from 0-255 to 0-1
X_train = X_train / 255
X_test = X_test / 255
# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]
# create model
self.model = Sequential()
self.model.add(Dense(num_pixels, input_dim=num_pixels, kernel_initializer='normal', activation='relu'))
self.model.add(Dense(num_classes, kernel_initializer='normal', activation='softmax'))
# Compile model
self.model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# train the model
self.model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200, verbose=2)
self.train_img = X_train
self.train_res = y_train
self.test_img = X_test
self.test_res = y_test
def predict_result(self, img, show = False):
""" predicts the number in a picture (vector) """
assert type(img) == np.ndarray and img.shape == (784,)
if show:
img = img.reshape((28, 28))
# show the picture
plt.imshow(img, cmap='Greys')
plt.show()
img = img.reshape(img.shape[0] * img.shape[1])
num_pixels = img.shape[0]
# the actual number
res_number = np.argmax(self.model.predict(img.reshape(-1,num_pixels)), axis = 1)
# the probabilities
res_probabilities = self.model.predict(img.reshape(-1,num_pixels))
return (res_number[0], res_probabilities.tolist()[0]) # we only need the first element since they only have one
def prepare_image(self, img, show = False):
""" prepares the partial images used in partial_img_rec by transforming them
into numpy arrays that the network will be able to process """
# convert to greyscale
img = img.convert("L")
# rescale image to 28 *28 dimension
img = img.resize((28,28), PIL.Image.ANTIALIAS)
# inverse colors since the training images have a black background
#img = PIL.ImageOps.invert(img)
# transform to vector
img = np.asarray(img, "float32")
img = img / 255.
img[img < 0.5] = 0.
img = cv2.threshold(img, 0.1, 1, cv2.THRESH_BINARY_INV)[1]
if show:
plt.imshow(img, cmap = "Greys")
# flatten image to 28*28 = 784 vector
num_pixels = img.shape[0] * img.shape[1]
img = img.reshape(num_pixels)
return img
def partial_img_rec(self, image, upper_left, lower_right, results=[], show = False):
""" partial is a part of an image """
left_x, left_y = upper_left
right_x, right_y = lower_right
print("current test part: ", upper_left, lower_right)
print("results: ", results)
# condition to stop recursion: we've reached the full width of the picture
width, height = image.size
if right_x > width:
return results
partial = image.crop((left_x, left_y, right_x, right_y))
if show:
partial.show()
partial = self.prepare_image(partial)
step = height // 10
# is there a number in this part of the image?
res, prop = self.predict_result(partial)
print("result: ", res, ". probabilities: ", prop)
# only count this result if the network is at least 50% sure
if prop[res] >= 0.5:
results.append(res)
# step is 80% of the partial image's size (which is equivalent to the original image's height)
step = int(height * 0.8)
print("found valid result")
else:
# if there is no number found we take smaller steps
step = height // 20
print("step: ", step)
# recursive call with modified positions ( move on step variables )
return self.partial_img_rec(image, (left_x + step, left_y), (right_x + step, right_y), results = results)
def individual_digits(self, img):
""" uses partial_img_rec to predict individual digits in square images """
assert type(img) == PIL.JpegImagePlugin.JpegImageFile or type(img) == PIL.PngImagePlugin.PngImageFile or type(img) == PIL.Image.Image
return self.partial_img_rec(img, (0,0), (img.size[0], img.size[1]), results=[])
def test_individual_digits(self):
""" test partial_img_rec with some individual digits (shape: square)
saved in the folder 'individual_test' following the pattern 'number_digit.jpg' """
cnt_right, cnt_wrong = 0,0
folder_content = os.listdir(".\individual_test")
for imageName in folder_content:
# image file must be a jpg or png
assert imageName[-4:] == ".jpg" or imageName[-4:] == ".png"
correct_res = int(imageName[0])
image = PIL.Image.open(".\individual_test\" + imageName).convert("L")
# only square images in this test
if image.size[0] != image.size[1]:
print(imageName, " has the wrong proportions: ", image.size,". It has to be a square.")
continue
predicted_res = self.individual_digits(image)
if predicted_res == []:
print("No prediction possible for ", imageName)
else:
predicted_res = predicted_res[0]
if predicted_res != correct_res:
print("error in partial_img-rec! Predicted ", predicted_res, ". The correct result would have been ", correct_res)
cnt_wrong += 1
else:
cnt_right += 1
print("correctly predicted ",imageName)
print(cnt_right, " out of ", cnt_right + cnt_wrong," digits were correctly recognised. The success rate is therefore ", (cnt_right / (cnt_right + cnt_wrong)) * 100," %.")
def multiple_digits(self, img):
""" takes as input an image without unnecessary whitespace surrounding the digits """
#assert type(img) == myImage
width, height = img.size
# start with the first square part of the image
res_list = self.partial_img_rec(img, (0,0),(height ,height), results = [])
res_str = ""
for elem in res_list:
res_str += str(elem)
return res_str
def test_multiple_digits(self):
""" tests the function 'multiple_digits' using some images saved in the folder 'multi_test'.
These images contain multiple handwritten digits without much whitespac surrounding them.
The correct solutions are saved in the files' names followed by the characte '_'. """
cnt_right, cnt_wrong = 0,0
folder_content = os.listdir(".\multi_test")
for imageName in folder_content:
# image file must be a jpg or png
assert imageName[-4:] == ".jpg" or imageName[-4:] == ".png"
image = PIL.Image.open(".\multi_test\" + imageName).convert("L")
correct_res = imageName.split("_")[0]
predicted_res = self.multiple_digits(image)
if correct_res == predicted_res:
cnt_right += 1
else:
cnt_wrong += 1
print("Error in multiple_digits! The network predicted ", predicted_res, " but the correct result would have been ", correct_res)
print("The network predicted correctly ", cnt_right, " out of ", cnt_right + cnt_wrong, " pictures. That's a success rate of ", cnt_right / (cnt_right + cnt_wrong) * 100, "%.")
network = mnist_network()
# this is the image shown above
result = network.individual_digits(PIL.Image.open(".\individual_test\7(2)_digit.jpg"))
你在 MNIST 数据集上的测试成绩是多少?
我想到的一件事是您的图像缺少阈值,
阈值化是一种将低于某个像素的像素值设为零的技术,请参阅任何地方的 OpenCV 阈值化示例,您可能需要使用逆阈值化并再次检查您的结果。
做,有进展通知
你遇到的主要问题是你正在测试的图像与 MNIST 图像不同,可能是由于你已经完成了图像的准备工作,你能展示你正在测试的图像吗?在上面应用 prepare_image。
更新:
您有三种选择可以在此特定任务中获得更好的表现:
- 使用卷积网络,因为它在具有空间数据的任务中表现更好,例如图像,并且是更具生成性的分类器,例如这个。
- 使用或创建 and/or 生成更多您的类型的图片 并训练您的网络 使您的网络能够也要学习它们。
- 预处理您的图像以更好地与原始 MNIST 图像对齐,您之前曾针对这些图像训练过您的网络。
我刚刚做了一个实验。我检查了关于每个代表数字的 MNIST 图像。我拍摄了您的图像并进行了一些我之前向您建议的预处理,例如:
1.做了一些阈值,只是向下消除了背景噪声,因为原始MNIST数据有一些最小阈值只针对空白背景:
image[image < 0.1] = 0.
2. 令人惊讶的是,图像内部数字的大小被证明是至关重要的,所以我缩放了 28 x 28 图像内部的数字,例如我们在数字周围有更多的填充。
3. 我反转了图像,因为来自 keras 的 MNIST 数据也反转了。
image = ImageOps.invert(image)
4. 最后缩放数据,正如我们在训练中所做的那样:
image = image / 255.
预处理后,我使用参数 epochs=12, batch_size=200
的 MNIST 数据集训练模型,结果:
结果:1 概率:0.6844741106033325
result: **1** . probabilities: [2.0584749904628552e-07, 0.9875971674919128, 5.821426839247579e-06, 4.979299319529673e-07, 0.012240586802363396, 1.1566483948399764e-07, 2.382085284580171e-08, 0.00013023221981711686, 9.620113416985987e-08, 2.5273093342548236e-05]
结果:6 概率:0.9221984148025513
result: 6 . probabilities: [9.130864782491699e-05, 1.8290626258021803e-07, 0.00020504613348748535, 2.1564576968557958e-07, 0.0002401985548203811, 0.04510130733251572, 0.9221984148025513, 1.9014490248991933e-07, 0.03216308355331421, 3.323434683011328e-08]
结果:7 概率:0.7105212807655334
注:
result: 7 . probabilities: [1.0372193770535887e-08, 7.988557626958936e-06, 0.00031014863634482026, 0.0056108818389475346, 2.434678014751057e-09, 3.2280522077599016e-07, 1.4190952857262573e-09, 0.9940618872642517, 1.612859932720312e-06, 7.102244126144797e-06]
您的号码 9 有点棘手:
当我弄清楚 MNIST 数据集的模型时,关于 9 的两个主要 "features"。上部和下部。与您的图像一样,具有漂亮圆形的上半部分不是 9,但对于您针对 MNIST 数据集训练的模型,主要是 3。根据 MNIST 数据集,9 的下半部分大部分是拉直曲线。所以基本上你的完美形状 9 对于你的模型总是 3 因为 MNIST 样本,除非你再次训练模型有足够的数量您形状的样本 9。为了验证我的想法,我用 9s:
做了一个子实验
我的 9 上部倾斜(根据 MNIST,对于 9 大部分都可以)但底部略微卷曲(不适用于9 根据 MNIST):
结果:9 概率:0.5365301370620728
我的 9 上部倾斜(根据 MNIST,对于 9 大部分都可以)并且底部是直的(对于 9 根据 MNIST):
结果:9 概率:0.923724353313446
你的 9 具有被误解的形状属性:
结果:3 概率:0.8158268928527832
result: 3 . probabilities: [9.367801249027252e-05, 3.9978775021154433e-05, 0.0001467708352720365, 0.8158268928527832, 0.0005801069783046842, 0.04391581565141678, 6.44062723154093e-08, 7.099170943547506e-06, 0.09051419794559479, 0.048875387758016586]
最后证明图像缩放(填充)的重要性,我在上面提到的至关重要:
结果:3 概率:0.9845736622810364
结果:9 概率:0.923724353313446
所以我们可以看到我们的模型拾取了一些特征,它解释,在图像内部的超大形状和低填充尺寸的情况下总是分类为 3 .
我认为我们可以通过 CNN 获得更好的性能,但采样和预处理的方式对于在 ML 任务中获得最佳性能始终至关重要。
希望对您有所帮助。
更新二:
我发现了另一个问题,我也检查过并证明是正确的,图像中数字的位置也很重要,这对于这种类型的神经网络来说是有意义的。一个很好的例子,数字 7 和 9 被放置在 MNIST 数据集中的中心,靠近图像的底部导致更难或 flase 分类如果我们将用于分类的新数字放在图像的中心。我检查了将 7s 和 9s 移向底部的理论,因此在图像顶部留下更多位置,结果是几乎 100% 准确度。
由于这是一个 spatial 类型的问题,我想,使用 CNN 我们可以更有效地消除它。然而,如果 MNIST 对齐到中心会更好,或者我们可以通过编程来避免这个问题。
我已经使用 mnist 数据集用 keras 构建了一个神经网络,现在我正尝试将它用于实际手写数字的照片。当然我并不期望结果是完美的,但我目前得到的结果还有很大的改进空间。
对于初学者,我使用一些以我最清晰的笔迹书写的单个数字的照片对其进行测试。它们是正方形的,并且与 mnist 数据集中的图像具有相同的尺寸和颜色。它们保存在名为 individual_test 的文件夹中,例如:7(2)_digit.jpg.
网络通常非常确定错误的结果,我会给你举个例子:
这张图片我得到的结果如下:
result: 3 . probabilities: [1.9963557196245318e-10, 7.241294497362105e-07, 0.02658148668706417, 0.9726449251174927, 2.5416460047722467e-08, 2.6078915027483163e-08, 0.00019745019380934536, 4.8302300825753264e-08, 0.0005754049634560943, 2.8358477788259506e-09]
所以网络有 97% 的把握确定这是一个 3,而这张照片并不是唯一的例子。在 38 张图片中,只有 16 张被正确识别。令我震惊的是,网络对它的结果如此确定,尽管它与正确结果相去甚远。
编辑
将阈值添加到 prepare_image (img = cv2.threshold(img, 0.1, 1, cv2.THRESH_BINARY_INV)[1]
) 后,性能略有提高。现在 38 张图片中有 19 张是正确的,但对于一些图片,包括上面显示的图片,它仍然非常确定错误的结果。这是我现在得到的:
result: 3 . probabilities: [1.0909866760000497e-11, 1.1584616004256532e-06, 0.27739930152893066, 0.7221096158027649, 1.900260038212309e-08, 6.555900711191498e-08, 4.479645940591581e-05, 6.455550760620099e-07, 0.0004443934594746679, 1.0013242457418414e-09]
所以现在只有 72% 确定它的结果哪个更好但仍然...
我可以做些什么来提高性能?我可以更好地准备图像吗?或者我应该将自己的图像添加到训练数据中吗?如果是这样,我会怎么做?
编辑
上面显示的图片在应用 prepare_image 后的样子:
使用阈值后,这是同一张图片的样子:
对比一下:这是mnist数据集提供的其中一张图片:
他们看起来和我很相似。我该如何改进呢?
这是我的代码(包括阈值):
# import keras and the MNIST dataset
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from keras.utils import np_utils
# numpy is necessary since keras uses numpy arrays
import numpy as np
# imports for pictures
import matplotlib.pyplot as plt
import PIL
import cv2
# imports for tests
import random
import os
class mnist_network():
def __init__(self):
""" load data, create and train model """
# load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# flatten 28*28 images to a 784 vector for each image
num_pixels = X_train.shape[1] * X_train.shape[2]
X_train = X_train.reshape((X_train.shape[0], num_pixels)).astype('float32')
X_test = X_test.reshape((X_test.shape[0], num_pixels)).astype('float32')
# normalize inputs from 0-255 to 0-1
X_train = X_train / 255
X_test = X_test / 255
# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]
# create model
self.model = Sequential()
self.model.add(Dense(num_pixels, input_dim=num_pixels, kernel_initializer='normal', activation='relu'))
self.model.add(Dense(num_classes, kernel_initializer='normal', activation='softmax'))
# Compile model
self.model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# train the model
self.model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200, verbose=2)
self.train_img = X_train
self.train_res = y_train
self.test_img = X_test
self.test_res = y_test
def predict_result(self, img, show = False):
""" predicts the number in a picture (vector) """
assert type(img) == np.ndarray and img.shape == (784,)
if show:
img = img.reshape((28, 28))
# show the picture
plt.imshow(img, cmap='Greys')
plt.show()
img = img.reshape(img.shape[0] * img.shape[1])
num_pixels = img.shape[0]
# the actual number
res_number = np.argmax(self.model.predict(img.reshape(-1,num_pixels)), axis = 1)
# the probabilities
res_probabilities = self.model.predict(img.reshape(-1,num_pixels))
return (res_number[0], res_probabilities.tolist()[0]) # we only need the first element since they only have one
def prepare_image(self, img, show = False):
""" prepares the partial images used in partial_img_rec by transforming them
into numpy arrays that the network will be able to process """
# convert to greyscale
img = img.convert("L")
# rescale image to 28 *28 dimension
img = img.resize((28,28), PIL.Image.ANTIALIAS)
# inverse colors since the training images have a black background
#img = PIL.ImageOps.invert(img)
# transform to vector
img = np.asarray(img, "float32")
img = img / 255.
img[img < 0.5] = 0.
img = cv2.threshold(img, 0.1, 1, cv2.THRESH_BINARY_INV)[1]
if show:
plt.imshow(img, cmap = "Greys")
# flatten image to 28*28 = 784 vector
num_pixels = img.shape[0] * img.shape[1]
img = img.reshape(num_pixels)
return img
def partial_img_rec(self, image, upper_left, lower_right, results=[], show = False):
""" partial is a part of an image """
left_x, left_y = upper_left
right_x, right_y = lower_right
print("current test part: ", upper_left, lower_right)
print("results: ", results)
# condition to stop recursion: we've reached the full width of the picture
width, height = image.size
if right_x > width:
return results
partial = image.crop((left_x, left_y, right_x, right_y))
if show:
partial.show()
partial = self.prepare_image(partial)
step = height // 10
# is there a number in this part of the image?
res, prop = self.predict_result(partial)
print("result: ", res, ". probabilities: ", prop)
# only count this result if the network is at least 50% sure
if prop[res] >= 0.5:
results.append(res)
# step is 80% of the partial image's size (which is equivalent to the original image's height)
step = int(height * 0.8)
print("found valid result")
else:
# if there is no number found we take smaller steps
step = height // 20
print("step: ", step)
# recursive call with modified positions ( move on step variables )
return self.partial_img_rec(image, (left_x + step, left_y), (right_x + step, right_y), results = results)
def individual_digits(self, img):
""" uses partial_img_rec to predict individual digits in square images """
assert type(img) == PIL.JpegImagePlugin.JpegImageFile or type(img) == PIL.PngImagePlugin.PngImageFile or type(img) == PIL.Image.Image
return self.partial_img_rec(img, (0,0), (img.size[0], img.size[1]), results=[])
def test_individual_digits(self):
""" test partial_img_rec with some individual digits (shape: square)
saved in the folder 'individual_test' following the pattern 'number_digit.jpg' """
cnt_right, cnt_wrong = 0,0
folder_content = os.listdir(".\individual_test")
for imageName in folder_content:
# image file must be a jpg or png
assert imageName[-4:] == ".jpg" or imageName[-4:] == ".png"
correct_res = int(imageName[0])
image = PIL.Image.open(".\individual_test\" + imageName).convert("L")
# only square images in this test
if image.size[0] != image.size[1]:
print(imageName, " has the wrong proportions: ", image.size,". It has to be a square.")
continue
predicted_res = self.individual_digits(image)
if predicted_res == []:
print("No prediction possible for ", imageName)
else:
predicted_res = predicted_res[0]
if predicted_res != correct_res:
print("error in partial_img-rec! Predicted ", predicted_res, ". The correct result would have been ", correct_res)
cnt_wrong += 1
else:
cnt_right += 1
print("correctly predicted ",imageName)
print(cnt_right, " out of ", cnt_right + cnt_wrong," digits were correctly recognised. The success rate is therefore ", (cnt_right / (cnt_right + cnt_wrong)) * 100," %.")
def multiple_digits(self, img):
""" takes as input an image without unnecessary whitespace surrounding the digits """
#assert type(img) == myImage
width, height = img.size
# start with the first square part of the image
res_list = self.partial_img_rec(img, (0,0),(height ,height), results = [])
res_str = ""
for elem in res_list:
res_str += str(elem)
return res_str
def test_multiple_digits(self):
""" tests the function 'multiple_digits' using some images saved in the folder 'multi_test'.
These images contain multiple handwritten digits without much whitespac surrounding them.
The correct solutions are saved in the files' names followed by the characte '_'. """
cnt_right, cnt_wrong = 0,0
folder_content = os.listdir(".\multi_test")
for imageName in folder_content:
# image file must be a jpg or png
assert imageName[-4:] == ".jpg" or imageName[-4:] == ".png"
image = PIL.Image.open(".\multi_test\" + imageName).convert("L")
correct_res = imageName.split("_")[0]
predicted_res = self.multiple_digits(image)
if correct_res == predicted_res:
cnt_right += 1
else:
cnt_wrong += 1
print("Error in multiple_digits! The network predicted ", predicted_res, " but the correct result would have been ", correct_res)
print("The network predicted correctly ", cnt_right, " out of ", cnt_right + cnt_wrong, " pictures. That's a success rate of ", cnt_right / (cnt_right + cnt_wrong) * 100, "%.")
network = mnist_network()
# this is the image shown above
result = network.individual_digits(PIL.Image.open(".\individual_test\7(2)_digit.jpg"))
你在 MNIST 数据集上的测试成绩是多少? 我想到的一件事是您的图像缺少阈值,
阈值化是一种将低于某个像素的像素值设为零的技术,请参阅任何地方的 OpenCV 阈值化示例,您可能需要使用逆阈值化并再次检查您的结果。
做,有进展通知
你遇到的主要问题是你正在测试的图像与 MNIST 图像不同,可能是由于你已经完成了图像的准备工作,你能展示你正在测试的图像吗?在上面应用 prepare_image。
更新:
您有三种选择可以在此特定任务中获得更好的表现:
- 使用卷积网络,因为它在具有空间数据的任务中表现更好,例如图像,并且是更具生成性的分类器,例如这个。
- 使用或创建 and/or 生成更多您的类型的图片 并训练您的网络 使您的网络能够也要学习它们。
- 预处理您的图像以更好地与原始 MNIST 图像对齐,您之前曾针对这些图像训练过您的网络。
我刚刚做了一个实验。我检查了关于每个代表数字的 MNIST 图像。我拍摄了您的图像并进行了一些我之前向您建议的预处理,例如:
1.做了一些阈值,只是向下消除了背景噪声,因为原始MNIST数据有一些最小阈值只针对空白背景:
image[image < 0.1] = 0.
2. 令人惊讶的是,图像内部数字的大小被证明是至关重要的,所以我缩放了 28 x 28 图像内部的数字,例如我们在数字周围有更多的填充。
3. 我反转了图像,因为来自 keras 的 MNIST 数据也反转了。
image = ImageOps.invert(image)
4. 最后缩放数据,正如我们在训练中所做的那样:
image = image / 255.
预处理后,我使用参数 epochs=12, batch_size=200
的 MNIST 数据集训练模型,结果:
结果:1 概率:0.6844741106033325
result: **1** . probabilities: [2.0584749904628552e-07, 0.9875971674919128, 5.821426839247579e-06, 4.979299319529673e-07, 0.012240586802363396, 1.1566483948399764e-07, 2.382085284580171e-08, 0.00013023221981711686, 9.620113416985987e-08, 2.5273093342548236e-05]
结果:6 概率:0.9221984148025513
result: 6 . probabilities: [9.130864782491699e-05, 1.8290626258021803e-07, 0.00020504613348748535, 2.1564576968557958e-07, 0.0002401985548203811, 0.04510130733251572, 0.9221984148025513, 1.9014490248991933e-07, 0.03216308355331421, 3.323434683011328e-08]
结果:7 概率:0.7105212807655334 注:
result: 7 . probabilities: [1.0372193770535887e-08, 7.988557626958936e-06, 0.00031014863634482026, 0.0056108818389475346, 2.434678014751057e-09, 3.2280522077599016e-07, 1.4190952857262573e-09, 0.9940618872642517, 1.612859932720312e-06, 7.102244126144797e-06]
您的号码 9 有点棘手:
当我弄清楚 MNIST 数据集的模型时,关于 9 的两个主要 "features"。上部和下部。与您的图像一样,具有漂亮圆形的上半部分不是 9,但对于您针对 MNIST 数据集训练的模型,主要是 3。根据 MNIST 数据集,9 的下半部分大部分是拉直曲线。所以基本上你的完美形状 9 对于你的模型总是 3 因为 MNIST 样本,除非你再次训练模型有足够的数量您形状的样本 9。为了验证我的想法,我用 9s:
做了一个子实验我的 9 上部倾斜(根据 MNIST,对于 9 大部分都可以)但底部略微卷曲(不适用于9 根据 MNIST):
结果:9 概率:0.5365301370620728
我的 9 上部倾斜(根据 MNIST,对于 9 大部分都可以)并且底部是直的(对于 9 根据 MNIST):
结果:9 概率:0.923724353313446
你的 9 具有被误解的形状属性:
结果:3 概率:0.8158268928527832
result: 3 . probabilities: [9.367801249027252e-05, 3.9978775021154433e-05, 0.0001467708352720365, 0.8158268928527832, 0.0005801069783046842, 0.04391581565141678, 6.44062723154093e-08, 7.099170943547506e-06, 0.09051419794559479, 0.048875387758016586]
最后证明图像缩放(填充)的重要性,我在上面提到的至关重要:
结果:3 概率:0.9845736622810364
结果:9 概率:0.923724353313446
所以我们可以看到我们的模型拾取了一些特征,它解释,在图像内部的超大形状和低填充尺寸的情况下总是分类为 3 .
我认为我们可以通过 CNN 获得更好的性能,但采样和预处理的方式对于在 ML 任务中获得最佳性能始终至关重要。
希望对您有所帮助。
更新二:
我发现了另一个问题,我也检查过并证明是正确的,图像中数字的位置也很重要,这对于这种类型的神经网络来说是有意义的。一个很好的例子,数字 7 和 9 被放置在 MNIST 数据集中的中心,靠近图像的底部导致更难或 flase 分类如果我们将用于分类的新数字放在图像的中心。我检查了将 7s 和 9s 移向底部的理论,因此在图像顶部留下更多位置,结果是几乎 100% 准确度。 由于这是一个 spatial 类型的问题,我想,使用 CNN 我们可以更有效地消除它。然而,如果 MNIST 对齐到中心会更好,或者我们可以通过编程来避免这个问题。