TensorFlow/Keras - 预期 global_average_pooling2d_1_input 具有形状 (1, 1, 2048) 但得到形状为 (7, 7, 2048) 的数组

Question

我对 TensorFlow 和图像分类还很陌生，所以我可能缺少关键知识，这可能就是我遇到这个问题的原因。

我已经在 TensorFlow 中构建了一个 ResNet50 模型，目的是使用 ImageNet 库对犬种进行图像分类，并且我已经成功地训练了一个可以检测各种犬种的神经网络。

我现在想将狗的随机图像传递给我的模型，以便它输出它认为狗的品种是什么的输出。但是，当我运行这个函数 dog_breed_predictor("<file path to image>") 时，我在它尝试执行行 Resnet50_model.predict(bottleneck_feature) 时收到错误 expected global_average_pooling2d_1_input to have shape (1, 1, 2048) but got array with shape (7, 7, 2048) 并且我不知道如何解决这个问题.

这是代码。我已经提供了所有我认为与问题相关的内容。

import cv2
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf

from keras.applications.resnet50 import ResNet50
from keras.preprocessing import image
from tqdm import tqdm

from sklearn.datasets import load_files
np_utils = tf.keras.utils

# define function to load train, test, and validation datasets
def load_dataset(path):
    data = load_files(path)
    dog_files = np.array(data['filenames'])
    dog_targets = np_utils.to_categorical(np.array(data['target']), 133)
    return dog_files, dog_targets

# load train, test, and validation datasets
train_files, train_targets = load_dataset('dogImages/dogImages/train')
valid_files, valid_targets = load_dataset('dogImages/dogImages/valid')
test_files, test_targets = load_dataset('dogImages/dogImages/test')

#define Resnet50 model
Resnet50_model = ResNet50(weights="imagenet")

def path_to_tensor(img_path):
    #loads RGB image as PIL.Image.Image type
    img = image.load_img(img_path, target_size=(224, 224))
    #convert PIL.Image.Image type to 3D tensor with shape (224, 224, 3)
    x = image.img_to_array(img)
    #convert 3D tensor into 4D tensor with shape (1, 224, 224, 3)
    return np.expand_dims(x, axis=0)

from keras.applications.resnet50 import preprocess_input, decode_predictions

def ResNet50_predict_labels(img_path):
    #returns prediction vector for image located at img_path
    img = preprocess_input(path_to_tensor(img_path))
    return np.argmax(Resnet50_model.predict(img))

###returns True if a dog is detected in the image stored at img_path
def dog_detector(img_path):
    prediction = ResNet50_predict_labels(img_path)
    return ((prediction <= 268) & (prediction >= 151))

###Obtain bottleneck features from another pre-trained CNN
bottleneck_features = np.load("bottleneck_features/DogResnet50Data.npz")
train_DogResnet50 = bottleneck_features["train"]
valid_DogResnet50 = bottleneck_features["valid"]
test_DogResnet50 = bottleneck_features["test"]

###Define your architecture
Resnet50_model = tf.keras.Sequential()
Resnet50_model.add(tf.keras.layers.GlobalAveragePooling2D(input_shape=train_DogResnet50.shape[1:]))
Resnet50_model.add(tf.contrib.keras.layers.Dense(133, activation="softmax"))

Resnet50_model.summary()

###Compile the model
Resnet50_model.compile(loss="categorical_crossentropy", optimizer="rmsprop", metrics=["accuracy"])
###Train the model
checkpointer = tf.keras.callbacks.ModelCheckpoint(filepath="saved_models/weights.best.ResNet50.hdf5",
                                                 verbose=1, save_best_only=True)

Resnet50_model.fit(train_DogResnet50, train_targets,
                  validation_data=(valid_DogResnet50, valid_targets),
                  epochs=20, batch_size=20, callbacks=[checkpointer])

###Load the model weights with the best validation loss.
Resnet50_model.load_weights("saved_models/weights.best.ResNet50.hdf5")

###Calculate classification accuracy on the test dataset
Resnet50_predictions = [np.argmax(Resnet50_model.predict(np.expand_dims(feature, axis=0))) for feature in test_DogResnet50]

#Report test accuracy
test_accuracy = 100*np.sum(np.array(Resnet50_predictions)==np.argmax(test_targets, axis=1))/len(Resnet50_predictions)
print("Test accuracy: %.4f%%" % test_accuracy)

def extract_Resnet50(tensor):
    from keras.applications.resnet50 import ResNet50, preprocess_input
    return ResNet50(weights='imagenet', include_top=False).predict(preprocess_input(tensor))

def dog_breed(img_path):
    #extract bottleneck features
    bottleneck_feature = extract_Resnet50(path_to_tensor(img_path))
    #obtain predicted vector
    predicted_vector = Resnet50_model.predict(bottleneck_feature) #shape error occurs here
    #return dog breed that is predicted by the model
    return dog_names[np.argmax(predicted_vector)]

def dog_breed_predictor(img_path):
    #determine the predicted dog breed
    breed = dog_breed(img_path)
    #display the image
    img = cv2.imread(img_path)
    cv_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    plt.imshow(cv_rgb)
    plt.show()
    #display relevant predictor result
    if dog_detector(img_path):
        print("This is a dog and its breed is: " + str(breed))
    elif face_detector(img_path):
        print("This is a human but it looks like a: " + str(breed))
    else:
        print("I don't know what this is.")

dog_breed_predictor("dogImages/dogImages/train/016.Beagle/Beagle_01126.jpg")

我输入我的函数的图像来自用于训练模型的同一数据集 - 我想看看自己是否模型按预期工作 - 所以这个错误使它更加混乱。我可能做错了什么？

Answer 1

ResNet50 的文档对构造函数参数 input_shape 做了一些说明（重点是我的）：

input_shape: optional shape tuple, only to be specified if include_top is False (otherwise the input shape has to be (224, 224, 3) (with 'channels_last' data format) or (3, 224, 224) (with 'channels_first' data format). It should have exactly 3 inputs channels, and width and height should be no smaller than 197. E.g. (200, 200, 3) would be one valid value.

我的猜测是，由于您将 include_top 指定为 False，网络定义将输入填充到比 224x224 更大的形状，因此当您提取特征时，您最终会得到一个特征图和没有特征向量（这就是你出错的原因）。

只需尝试以这种方式指定和input_shape：

return ResNet50(weights='imagenet',
                include_top=False,
                input_shape=(224, 224, 3)).predict(preprocess_input(tensor))

Answer 2

感谢nessuno的帮助，我解决了这个问题。问题确实出在 ResNet50 的 pooling 层。

我上面脚本中的以下代码：

return ResNet50(weights='imagenet',
                include_top=False).predict(preprocess_input(tensor))

returns一个(1, 7, 7, 2048)的形状（诚然，我并不完全理解为什么）。为了解决这个问题，我在参数 pooling="avg" 中添加了这样的内容：

return ResNet50(weights='imagenet',
                include_top=False,
                pooling="avg").predict(preprocess_input(tensor))

这个 returns 的形状 (1, 2048) （再次承认，我不知道 为什么 .)

但是，该模型仍需要 4 维形状。为了解决这个问题，我在 dog_breed() 函数中添加了以下代码：

print(bottleneck_feature.shape) #returns (1, 2048)
bottleneck_feature = np.expand_dims(bottleneck_feature, axis=0)
bottleneck_feature = np.expand_dims(bottleneck_feature, axis=0)
bottleneck_feature = np.expand_dims(bottleneck_feature, axis=0)
print(bottleneck_feature.shape) #returns (1, 1, 1, 1, 2048) - yes a 5D shape, not 4.

和这个returns一个(1, 1, 1, 1, 2048)的形状。出于某种原因，当我只添加 2 个维度时，模型仍然抱怨它是 3D 形状，但是当我添加第 3 个维度时模型停止了（这很奇怪，我想进一步了解这是为什么。）。

总的来说，我的 dog_breed() 函数来自：

def dog_breed(img_path):
    #extract bottleneck features
    bottleneck_feature = extract_Resnet50(path_to_tensor(img_path))
    #obtain predicted vector
    predicted_vector = Resnet50_model.predict(bottleneck_feature) #shape error occurs here
    #return dog breed that is predicted by the model
    return dog_names[np.argmax(predicted_vector)]

对此：

def dog_breed(img_path):
    #extract bottleneck features
    bottleneck_feature = extract_Resnet50(path_to_tensor(img_path))
    print(bottleneck_feature.shape) #returns (1, 2048)
    bottleneck_feature = np.expand_dims(bottleneck_feature, axis=0)
    bottleneck_feature = np.expand_dims(bottleneck_feature, axis=0)
    bottleneck_feature = np.expand_dims(bottleneck_feature, axis=0)
    print(bottleneck_feature.shape) #returns (1, 1, 1, 1, 2048) - yes a 5D shape, not 4.
    #obtain predicted vector
    predicted_vector = Resnet50_model.predict(bottleneck_feature) #shape error occurs here
    #return dog breed that is predicted by the model
    return dog_names[np.argmax(predicted_vector)]

同时确保将参数 pooling="avg" 添加到我对 ResNet50 的调用中。

TensorFlow/Keras - 预期 global_average_pooling2d_1_input 具有形状 (1, 1, 2048) 但得到形状为 (7, 7, 2048) 的数组

TensorFlow/Keras - expected global_average_pooling2d_1_input to have shape (1, 1, 2048) but got array with shape (7, 7, 2048)

python

keras

tensorflow

resnet

imagenet