预期 ndim=3,发现 ndim=4。在 keras 后端使用 K.function() 获取模型中的中间层时

expected ndim=3, found ndim=4. when using K.function() in keras backend to get intermediate layers in a model

我正在尝试提取根据某些数据训练的分类模型的最后一层。第一层是 Embedding 层,然后是 bilstm 层,然后是输出密集层。我的代码播种在下面。我一直得到 4d 输出 (1,38,300,300) 而不是 3d (1,38,300)。 1是样本大小,38是句子的最大长度,300是word2vec长度。

from keras import backend as K
from tensorflow.keras.models import load_model
import numpy as np
import gensim
word2vec = 'GoogleNews-vectors-negative300.txt'


x_matrix = np.zeros((1, 38, 300))
sentene_label = 'the weather today was extremely unpredictable,0'
parts = sentene_label.split(',')
label = int(parts[1])  
sentence = parts[0] 

words = sentence.split(' ')
words = words[:x_matrix.shape[1]]  
for j, word in enumerate(words):
    if word in word2vec:
        # x_matrix[0, j, :] = word2vec[word]
        x_matrix[0, j, :] = loaded_model.word_vec(word)


model = load_model('TrainedModel.h5')
get_3rd_layer_output = K.function([model.layers[0].input], [model.layers[2].output])  
layer_output = get_3rd_layer_output(x_matrix)[0]
print("Layer Output Shape 1 : ", layer_output.shape)

我已经多次交叉检查我的代码,但我似乎无法弄清楚为什么尺寸是错误的。

这是回溯

Traceback (most recent call last):
  File "/usr/pkg/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3427, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-bb840b495480>", line 1, in <module>
    runfile('/am/vuwstocoisnrin1.vuw.ac.nz/ecrg-solar/kosimadukwe/Data Augmentation/test.py', wdir='/am/vuwstocoisnrin1.vuw.ac.nz/ecrg-solar/kosimadukwe/Data Augmentation')
  File "/am/embassy/vol/x6/jetbrains/apps/PyCharm-P/ch-0/201.7846.77/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/am/embassy/vol/x6/jetbrains/apps/PyCharm-P/ch-0/201.7846.77/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/am/vuwstocoisnrin1.vuw.ac.nz/ecrg-solar/kosimadukwe/Data Augmentation/test.py", line 451, in <module>
    layer_output = get_3rd_layer_output(x_matrix)[0]
  File "/usr/pkg/lib/python3.8/site-packages/tensorflow/python/keras/backend.py", line 4073, in func
    outs = model(model_inputs)
  File "/usr/pkg/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1012, in __call__
    outputs = call_fn(inputs, *args, **kwargs)
  File "/usr/pkg/lib/python3.8/site-packages/tensorflow/python/keras/engine/functional.py", line 424, in call
    return self._run_internal_graph(
  File "/usr/pkg/lib/python3.8/site-packages/tensorflow/python/keras/engine/functional.py", line 560, in _run_internal_graph
    outputs = node.layer(*args, **kwargs)
  File "/usr/pkg/lib/python3.8/site-packages/tensorflow/python/keras/layers/wrappers.py", line 539, in __call__
    return super(Bidirectional, self).__call__(inputs, **kwargs)
  File "/usr/pkg/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py", line 998, in __call__
    input_spec.assert_input_compatibility(self.input_spec, inputs, self.name)
  File "/usr/pkg/lib/python3.8/site-packages/tensorflow/python/keras/engine/input_spec.py", line 219, in assert_input_compatibility
    raise ValueError('Input ' + str(input_index) + ' of layer ' +
ValueError: Input 0 of layer bidirectional_9 is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: (1, 38, 300, 300)

线上触发错误

layer_output = get_3rd_layer_output(x_matrix)[0]

调用get_3rd_layer_output前x_matrix的形状是

The shape of X matrix : (60, 38, 300)

TrainedModels 架构

model = Sequential()
model.add(Embedding(vocab_size, 300, input_length=38, weights=[embedding_matrix], trainable=True))
model.add(Bidirectional(LSTM(100, dropout=0.2)))
model.add(Dense(3, activation='sigmoid'))
model.compile(loss='sparse_categorical_crossentropy', optimizer='Adagrad', metrics=['accuracy'])
model.summary()

es = EarlyStopping(monitor='val_loss', mode='min', baseline=0.3, patience=100, verbose=1)
mc = ModelCheckpoint('TrainedModel.h5', monitor='val_loss', mode='min', verbose=1, save_best_only=True)
hist = model.fit(train_sequences, train_y, epochs=200, verbose=False, batch_size=100,validation_data=(val_sequences, val_y),callbacks=[es, mc]) 

TrainedModels model.summary 是

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_9 (Embedding)      (None, 38, 300)           7370400   
_________________________________________________________________
bidirectional_9 (Bidirection (None, 200)               320800    
_________________________________________________________________
dense_9 (Dense)              (None, 3)                 603       
=================================================================
Total params: 7,691,803
Trainable params: 7,691,803
Non-trainable params: 0
_________________________________________________________________

获得任何中间层输出的正确方法是创建一个子模型,期望与您训练的模型具有相同的输入。在您的情况下,出现错误是因为您将 3D 嵌入矩阵传递给训练模型,同时您必须传递用于训练的相同数据(2D 数据和整数编码字)。

这里我生成了一个虚拟示例来正确地从您的模型中提取任何中间输出。

创建虚拟数据:

vocab_size = 111
emb_size = 300
input_length = 38
n_sample = 50
n_classes = 3

embedding_matrix = np.random.uniform(-1,1, (vocab_size, emb_size))
X = np.random.randint(0,vocab_size, (n_sample, input_length))
Y = np.random.randint(0,n_classes, (n_sample,))

创建模型并训练:

import tensorflow as tf
from tensorflow.keras.layers import *
from tensorflow.keras.models import *
from tensorflow.keras import backend as K

model = Sequential()
model.add(Embedding(vocab_size, emb_size, input_length=input_length, 
                    weights=[embedding_matrix], trainable=True))
model.add(Bidirectional(LSTM(100, dropout=0.2)))
model.add(Dense(n_classes, activation='sigmoid'))
model.compile(loss='sparse_categorical_crossentropy', 
              optimizer='Adagrad', metrics=['accuracy'])
model.fit(X,Y, epochs=3)  ### TRAINED WITH X

获取图层输出:

layer_id = 2
get_layer_output = K.function([model.layers[0].input], [model.layers[layer_id].output])
layer_output = get_layer_output(X)[0]  ### EXTRACT FROM X
# equal to:
# sub_model = Model(model.input, model.layers[layer_id].output)
# layer_output = sub_model.predict(X)  ### EXTRACT FROM X