神经网络 - 自信地在一张图像中检测多个对象

Neural Networks - Multiple object detection in one image with confidence

我了解 CNN 如何解决分类问题，例如在 MNIST 数据集上，每个图像代表一个手写数字。对图像进行评估，并有一定的信心给出分类。

我想知道如果我想在一张图像中识别多个对象，并且对每个对象都有信心，我应该采用什么方法。例如 - 如果我评估了猫和狗的图像，我希望 'cat' 和 'dog' 都具有高置信度。我不关心物体在图片中的什么位置。

根据我目前的知识，我可以构建一个仅包含狗的图像数据集和一个仅包含猫的图像数据集。我会重新训练顶级 Inception V3 网络，它能够识别哪些图像是猫，哪些图像是狗。

这个问题是评估狗和猫的图像会导致 50% 的狗和 50% 的猫 - 因为它正在尝试对图像进行分类，但我想 'tag'图像（理想情况下达到 ~100% 狗，~100% 猫）。

我简要地看了 region-based CNNs，它解决了类似的问题，但我不关心 where图片对象是 - 只是它们每个都可以被识别。

有什么方法可以解决这个问题？我想在 Python 中使用 Tensorflow 或 Keras 之类的东西来实现这一点。

首先，为了容易理解，假设你有 2 个独立的神经网络，一个只识别猫是否在图像中，另一个识别狗是不是狗，神经元肯定会学习如何识别那个漂亮的嗯

但更有趣的是，这 2 个网络可以组合成 单个网络以共享权重 ，并为狗和猫提供 2 个输出 一起。为此，您只需要注意：

2class（猫和狗）可以在同一个图像中，然后[cat_label, dog label] ={[0, 0], [0, 1], [1, 0], [1, 1]}。不像 MNIST 或普通的 class化模型，其中 [cat_label, dog label] ={[0, 1], [1, 0]}（one_hot 标签）。
预测的时候，可以选择一些阈值来判断猫狗是否出现，比如if y_cat>0.5 and y_dog>0.5，那么猫狗就在图像中。

希望对您有所帮助！

我知道这是一个老问题，但如果它出现在任何 Google 搜索其他人的首页（就像它对我所做的那样），我想我可以插话一下有帮助。

InceptionV3 的最后一层是一个 Softmax 函数，它试图说这是标签 A 或标签 B。

但是，如果您想为多标签分类修改诸如 Inception 之类的东西，而不是在最后一层使用 Softmax，您想要将其换成 Sigmoid 之类的东西，以便每个标签都单独测量优点（而不是与其邻居进行比较）。

有关此背后原因的更多信息（以及有关如何修改 retrain.py 的完整说明）可在此处找到：

https://towardsdatascience.com/multi-label-image-classification-with-inception-net-cbb2ee538e30

The add_final_training_ops() method originally added a new softmax and fully-connected layer for training. We just need to replace the softmax function with a different one.

Why?

The softmax function squashes all values of a vector into a range of [0,1] summing together to 1. Which is exactly what we want in a single-label classification. But for our multi-label case, we would like our resulting class probabilities to be able to express that an image of a car belongs to class car with 90% probability and to class accident with 30% probability etc. We will achieve that by using for example sigmoid function. Specifically we will replace:

final_tensor = tf.nn.softmax(logits, name=final_tensor_name)

with:

final_tensor = tf.nn.sigmoid(logits, name=final_tensor_name)

We also have to update the way cross entropy is calculated to properly train our network:

Again, simply replace softmax with sigmoid:

cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(logits,ground_truth_input)

神经网络 - 自信地在一张图像中检测多个对象

Neural Networks - Multiple object detection in one image with confidence

image-processing

object-detection

neural-network

conv-neural-network

tensorflow