预测图像 class 中单个图像的标签 classification

Question

我正在使用深度学习神经网络设计静态手势识别。我从 kaggle 上的这个实现开始 - https://www.kaggle.com/ranjeetjain3/deep-learning-using-sign-langugage/notebook#Sign-Language. 这看起来的准确性非常高，但是当我尝试对自定义图像进行预测时，我得到了错误的结果。作为一个新手，我对自己的解释表示怀疑，需要帮助。

下面是我的预测代码：

import matplotlib.image as mpimg 
import matplotlib.pyplot as plt 

# Read Images 
infer_image = mpimg.imread('D:\Mayuresh\DL using SL MNIST\input\infer\7.png') 
plt.imshow(infer_image) 

# Resizing before the prediction
infer_image = np.resize(infer_image, (28,28,1))
infer_image_arr = np.array(infer_image)
infer_image_arr = infer_image_arr.reshape(1,28,28,1)

# Prediction
y_pred = model.predict_classes(infer_image_arr)
print(y_pred)

# Dictionary for translation
my_dict2 = {
    0: 'a',
    1: 'b',
    2: 'c',
    3: 'd',
    4: 'e',
    5: 'f',
    6: 'g',
    7: 'h',
    8: 'i',
    9: 'k',
    10: 'l',
    11: 'm',
    12: 'n',
    13: 'o',
    14: 'p',
    15: 'q',
    16: 'r',
    17: 's',
    18: 't',
    19: 'u',
    20: 'v',
    21: 'w',
    22: 'x',
    23: 'y'
}

my_dict2[int(y_pred)]

有人可以提出需要更改的建议或片段来预测一幅图像的手势吗？

Answer 1

我假设您没有为您的项目训练任何东西并使用 Kaggle 网站中给出的神经网络权重。

The accuracy of this looks very high, but when I try predictions for custom images, I am getting wrong results.

您使用的网络似乎对 MNIST 数据集过度拟合。所以当你给出不同的图像时，它会给出不好的结果。

你应该做的是创建一个手势数据集，其中包含许多案例，尤其是你想要检测的案例。然后你应该用这个新创建的数据集训练你的网络，使用当前权重作为训练的初始权重。您的网络应该学习不同的手势情况。提高项目准确性的关键是使用与推理输入相似的不同手势图像来训练网络。

Answer 2

我相信你需要一个代码库，你可以在你生成的数据集上训练你的模型。基于您的背景、灯光设置等。这比使用根据不同分布的数据训练的预训练模型更好

我建议使用可以检测 13 种手势的 this where you can start video feed and training images will be automatically taken for the gestures. You also can select the number of classes you want. That can improve your performance. Or else you can use the original code Emojinator

预测图像 class 中单个图像的标签 classification

Predicting class label for single image in image classification

image-processing

gesture-recognition

object-detection

neural-network

deep-learning