使用词嵌入和 TFIDF 向量的 LSTM

Question

我正尝试在具有文本属性和 TFIDF 向量的数据集上运行 LSTM。我将文本和输入词嵌入到 LSTM 层。接下来，我连接 LSTM 输出和 TFIDF 向量。但是，下面代码中的第 2 行会引发以下错误：

"ValueError: Layer lstm_1 was called with an input that isn't a symbolic tensor. Received type: . Full input: []. All inputs to the layer should be tensors."

代码如下，其中len(term_Index)+1 = 9891，emb_Dim=100，emb_Mat包含浮点数，形状为[9891,100]，并且sen_Len=1000:

    embed = Embedding(len(term_Index) + 1, emb_Dim, weights=[emb_Mat], 
    input_length=sen_Len, trainable=False)
    lstm = LSTM(60, dropout=0.1, recurrent_dropout=0.1)(embed)
    tfidf_i = Input(shape=(max_terms_art,))
    conc = Concatenate()(lstm, tfidf_i)
    drop = Dropout(0.2)(conc)
    dens = Dense(1)(drop)
    acti = Activation('sigmoid')(dens)

    model = Model([embed, tfidf_i], acti)
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics = ['accuracy'])
    history = model.fit([features_Train, TFIDF_Train], target_Train, epochs = 50, batch_size=128, validation_split=0.20)

Answer 1

看来我无法重现你的错误。我加上括号后，代码运行就完美了。请参阅下面的代码：

from tensorflow.keras.layers import Input, Embedding, LSTM, Concatenate, Dropout, Dense, Activation
from tensorflow.keras import Model
import tensorflow as tf
import numpy as np

emb_Mat = tf.random.normal((9891,100)).numpy()
term_Index = tf.random.uniform((9890,)).numpy()
sen_Len=1000
emb_Dim=100
max_terms_art=500

inp = Input(shape=(len(term_Index),))
embed = Embedding(len(term_Index) + 1, emb_Dim, weights=[emb_Mat], input_length=sen_Len, trainable=False)(inp)
lstm = LSTM(60, dropout=0.1, recurrent_dropout=0.1)(embed)
tfidf_i = Input(shape=(max_terms_art,))
conc = Concatenate()([lstm, tfidf_i])
drop = Dropout(0.2)(conc)
dens = Dense(1)(drop)
acti = Activation('sigmoid')(dens)

Model([inp, tfidf_i], acti).summary()

输出：

Model: "model_2"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_16 (InputLayer)           [(None, 9890)]       0                                            
__________________________________________________________________________________________________
embedding_15 (Embedding)        (None, 9890, 100)    989100      input_16[0][0]                   
__________________________________________________________________________________________________
lstm_8 (LSTM)                   (None, 60)           38640       embedding_15[0][0]               
__________________________________________________________________________________________________
input_17 (InputLayer)           [(None, 500)]        0                                            
__________________________________________________________________________________________________
concatenate_2 (Concatenate)     (None, 560)          0           lstm_8[0][0]                     
                                                                 input_17[0][0]                   
__________________________________________________________________________________________________
dropout_1 (Dropout)             (None, 560)          0           concatenate_2[0][0]              
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 1)            561         dropout_1[0][0]                  
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 1)            0           dense_1[0][0]                    
==================================================================================================
Total params: 1,028,301
Trainable params: 39,201
Non-trainable params: 989,100
__________________________________________________________________________________________________

使用词嵌入和 TFIDF 向量的 LSTM

LSTM using word embeddings and TFIDF vectors

python

tf-idf

lstm

keras

tensorflow