如何在 Keras 中使用我自己的句子嵌入？

Question

我是 Keras 的新手，我创建了自己的 tf_idf 形状为 (no_sentences、embedding_dim) 的句子嵌入。我正在尝试将此矩阵作为输入添加到 LSTM 层。我的网络看起来像这样：

q1_tfidf = Input(name='q1_tfidf', shape=(max_sent, 300))
q2_tfidf = Input(name='q2_tfidf', shape=(max_sent, 300))

q1_tfidf = LSTM(100)(q1_tfidf)
q2_tfidf = LSTM(100)(q2_tfidf)
distance2 = Lambda(preprocessing.exponent_neg_manhattan_distance, output_shape=preprocessing.get_shape)(
        [q1_tfidf, q2_tfidf])

我正在为应该如何塑造矩阵而苦恼。我收到此错误：

ValueError: Error when checking input: expected q1_tfidf to have 3 dimensions, but got array with shape (384348, 300)

我已经检查过这个 post: Sentence Embedding Keras 但还是想不通。似乎我遗漏了一些明显的东西。

知道怎么做吗？

Answer 1

好的，据我了解，你想要预测两个句子之间的差异。怎么样复用LSTM层（语言模型应该是一样的），只学习一个句子embedding，用两次：

q1_tfidf = Input(name='q1_tfidf', shape=(max_sent, 300))
q2_tfidf = Input(name='q2_tfidf', shape=(max_sent, 300))

lstm = LSTM(100)

lstm_out_q1= lstm (q1_tfidf)
lstm_out_q2= lstm (q2_tfidf)
predict = concatenate([lstm_out_q1, lstm_out_q2])
model = Model(inputs=[q1_tfidf ,q1_tfidf ], outputs=predict)

predict = concatenate([q1_tfidf , q2_tfidf])

您也可以在额外的 lambda 层中引入您的自定义距离，但因此您需要在连接中使用不同的整形。

如何在 Keras 中使用我自己的句子嵌入？

How to use my own sentence embeddings in Keras?

nlp

sentence-similarity

lstm

keras

word-embedding