使用 VGG16 预训练权重的 Imagenet 分类问题

Question

我试图运行在 tensorflow 中使用 VGG16 网络进行香草图像网络分类（通过 Keras backbone 给出 VGG16）。

然而，当我尝试运行对样本大象图像进行分类时，它给出了完全出乎意料的结果。

我无法弄清楚可能是什么问题。

这是我使用的完整代码：

import tensorflow as tf
import numpy as np
from PIL import Image
from tensorflow.python.keras._impl.keras.applications import imagenet_utils


model = tf.keras.applications.VGG16()
VGG = model.graph

VGG.get_operations()
input = VGG.get_tensor_by_name("input_1:0")
output = VGG.get_tensor_by_name("predictions/Softmax:0")
print(input)
print(output)

I = Image.open("Elephant.jpg")
new_img = I.resize((224,224))
image_array = np.array(new_img)[:, :, 0:3]
image_array = np.expand_dims(image_array, axis=0)


with tf.Session(graph=VGG) as sess:
    init_op = tf.global_variables_initializer()
    sess.run(init_op)
    pred = (sess.run(output,{input:image_array}))
    print(imagenet_utils.decode_predictions(pred))

下面是我得到的示例输出：

Tensor("input_1:0", shape=(?, 224, 224, 3), dtype=float32)
Tensor("predictions/Softmax:0", shape=(?, 1000), dtype=float32)

[[('n02281406', 'sulphur_butterfly', 0.0022673723), ('n01882714', 'koala', 0.0021256246), ('n04325704', 'stole', 0.0020583202), ('n01496331', 'electric_ray', 0.0020416214), ('n01797886', 'ruffed_grouse', 0.0020229272)]]

从概率上看，传递的图像数据似乎有问题（因为所有数据都很低）。

但我无法弄清楚哪里出了问题。
而且我非常确定图像是大象作为人！

Answer 1

我认为有 2 个错误，第一个是您必须将所有像素除以 255 来重新缩放图像。

I = Image.open("Elephant.jpg")
new_img = I.resize((224,224))
image_array = np.array(new_img)[:, :, 0:3]
image_array /= 255.
image_array = np.expand_dims(image_array, axis=0)

第二点是我在看预测值的时候得到的。您有一个包含 1000 个元素的向量，并且所有元素在重新缩放后都有 0.1% 的预测。这意味着您有一个 non-trained 模型。我不知道如果在 tensorflow 中加载如何，但是在 Keras 上你可以这样做：

app = applications.vgg16
model = app.VGG16(
        include_top=False,    # this is to have the classifier Standard from imagenet
        weights='imagenet',   # this load weight, else it's random weight
        pooling="avg")

根据我的阅读，您必须下载另一个包含权重的文件，例如 github。

希望对你有帮助，

编辑1：

我尝试了使用 Keras 的相同模型：

from keras.applications.vgg16 import VGG16, decode_predictions
import numpy as np

model = VGG16(weights='imagenet')

I = Image.open("Elephant.jpg")
new_img = I.resize((224,224))
image_array = np.array(new_img)[:, :, 0:3]
image_array = image_array/255.
x = np.expand_dims(image_array, axis=0)

preds = model.predict(x)
print('Predicted:', decode_predictions(preds, top=5)[0])

如果我评论重新缩放，我有错误的预测：

Predicted: [('n03788365', 'mosquito_net', 0.22725257), ('n15075141', 'toilet_tissue', 0.026636025), ('n04209239', 'shower_curtain', 0.019786758), ('n02804414', 'bassinet', 0.01353887), ('n03131574', 'crib', 0.01316699)]

没有重新缩放，这很好:

Predicted: [('n02504458', 'African_elephant', 0.95870858), ('n01871265', 'tusker', 0.040065952), ('n02504013', 'Indian_elephant', 0.0012253703), ('n01704323', 'triceratops', 5.0949382e-08), ('n02454379', 'armadillo', 5.0408511e-10)]

现在，如果我移除权重，我将拥有 "same" 与我在 Tensorflow 中拥有的一样：

Predicted: [('n07717410', 'acorn_squash', 0.0010033853), ('n02980441', 'castle', 0.0010028203), ('n02124075', 'Egyptian_cat', 0.0010028186), ('n04179913', 'sewing_machine', 0.0010027955), ('n02492660', 'howler_monkey', 0.0010027081)]

对我来说，这意味着你没有施加重量。也许它们已下载但未使用。

Answer 2

看来我们可以（或需要？）使用来自 Keras 的会话（它具有带权重的关联加载图），而不是在 Tensorflow 中创建新会话并使用从 Keras 模型获得的图，如下所示

VGG = model.graph

我认为上面的图表没有权重（这就是预测错误的原因），Keras 会话中的图表作为适当的权重（因此这两个图表实例应该不同）

完整代码如下：

import tensorflow as tf
import numpy as np
from PIL import Image
from tensorflow.python.keras._impl.keras.applications import imagenet_utils
from tensorflow.python.keras._impl.keras import backend as K


model = tf.keras.applications.VGG16()
sess = K.get_session()
VGG = model.graph #Not needed and also doesnt have weights in it

VGG.get_operations()
input = VGG.get_tensor_by_name("input_1:0")
output = VGG.get_tensor_by_name("predictions/Softmax:0")
print(input)
print(output)

I = Image.open("Elephant.jpg")
new_img = I.resize((224,224))
image_array = np.array(new_img)[:, :, 0:3]
image_array = np.expand_dims(image_array, axis=0)
image_array = image_array.astype(np.float32)
image_array = tf.keras.applications.vgg16.preprocess_input(image_array)

pred = (sess.run(output,{input:image_array}))
print(imagenet_utils.decode_predictions(pred))

这给出了预期的结果：

[[('n02504458', 'African_elephant', 0.8518132), ('n01871265', 'tusker', 0.1398836), ('n02504013', 'Indian_elephant', 0.0082286), ('n01704323', 'triceratops', 6.965483e-05), ('n02397096', 'warthog', 1.8662439e-06)]]

感谢 Idavid for the tip about using preprocess_input() function and Nicolas 关于卸载重量的提示。

使用 VGG16 预训练权重的 Imagenet 分类问题

Issue with Imagenet classification with VGG16 pretrained weights

python

computer-vision

conv-neural-network

tensorflow

imagenet