使用 LSTM 微调通用句子编码器
Fine-tuning Universal Sentence Encoder with LSTM
输入数据:
string_1_A, string_2_A, string_3_A, label_A
string_1_B, string_2_B, string_3_B, label_B
...
string_1_Z, string_2_Z, string_3_Z, label_Z
我想使用 Universal Sentence Encoder (v4) 来嵌入该字符串(将是句子),然后将其输入 LSTM 以对该序列进行预测。我最终得到以下代码:
import tensorflow_hub as hub
import tensorflow as tf
import tensorflow.keras.backend as K
from tensorflow.keras.layers import LSTM
module_url = "../resources/embeddings/use-4"
def get_lstm_model():
embedding_layer = hub.KerasLayer(module_url)
inputs = tf.keras.layers.Input(shape=(3, ), dtype=tf.string)
x = tf.keras.layers.Lambda(lambda y: tf.expand_dims(embedding_layer(tf.squeeze(y)), 1))(inputs)
x = LSTM(128, return_sequences=False)(x)
outputs = tf.keras.layers.Dense(1, activation="sigmoid")(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
model.compile("adam", K.binary_crossentropy)
model.summary()
return model
if __name__ == '__main__':
model = get_lstm_model()
print(model.predict([[["a"], ["b"], ["c"]]]))
问题是某些层的 input/output 维度与我预期的不匹配(而不是我期望的 1 3):
input_1 (InputLayer) [(None, 3)] 0
_________________________________________________________________
lambda (Lambda) (None, ***1***, 512) 0
任何建议 - 我认为我需要更好地处理挤压和取消挤压。
最简单的解决方案是将每个 string/sentence 分别 传递给通用句子编码器。这会为形状为 512 的每个 string/sentence 生成一个嵌入,可以将其连接起来形成形状为 (None、n_sentences、512) 的张量。
这是模型的代码:
n_sentences = 50
module_url = "https://tfhub.dev/google/universal-sentence-encoder/4"
def get_lstm_model():
embedding_layer = hub.KerasLayer(module_url, trainable=True)
input = Input(shape=(n_sentences,), dtype=tf.string)
x = [Reshape((1,512))(embedding_layer(input[:, s])) for s in range(n_sentences)]
x = Concatenate(axis=1)(x)
x = LSTM(128, return_sequences=False)(x)
output = Dense(1, activation="sigmoid")(x)
model = Model(inputs=input, outputs=output)
model.compile("adam", "binary_crossentropy")
model.summary()
return model
推理时:
sentences = [str(i) for i in range(n_sentences)]
X = [sentences] # 1 sample
print(model.predict(X).shape)
X = [sentences, sentences[::-1]] # 2 samples
print(model.predict(X).shape)
Here 运行 笔记本
输入数据:
string_1_A, string_2_A, string_3_A, label_A
string_1_B, string_2_B, string_3_B, label_B
...
string_1_Z, string_2_Z, string_3_Z, label_Z
我想使用 Universal Sentence Encoder (v4) 来嵌入该字符串(将是句子),然后将其输入 LSTM 以对该序列进行预测。我最终得到以下代码:
import tensorflow_hub as hub
import tensorflow as tf
import tensorflow.keras.backend as K
from tensorflow.keras.layers import LSTM
module_url = "../resources/embeddings/use-4"
def get_lstm_model():
embedding_layer = hub.KerasLayer(module_url)
inputs = tf.keras.layers.Input(shape=(3, ), dtype=tf.string)
x = tf.keras.layers.Lambda(lambda y: tf.expand_dims(embedding_layer(tf.squeeze(y)), 1))(inputs)
x = LSTM(128, return_sequences=False)(x)
outputs = tf.keras.layers.Dense(1, activation="sigmoid")(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
model.compile("adam", K.binary_crossentropy)
model.summary()
return model
if __name__ == '__main__':
model = get_lstm_model()
print(model.predict([[["a"], ["b"], ["c"]]]))
问题是某些层的 input/output 维度与我预期的不匹配(而不是我期望的 1 3):
input_1 (InputLayer) [(None, 3)] 0
_________________________________________________________________
lambda (Lambda) (None, ***1***, 512) 0
任何建议 - 我认为我需要更好地处理挤压和取消挤压。
最简单的解决方案是将每个 string/sentence 分别 传递给通用句子编码器。这会为形状为 512 的每个 string/sentence 生成一个嵌入,可以将其连接起来形成形状为 (None、n_sentences、512) 的张量。
这是模型的代码:
n_sentences = 50
module_url = "https://tfhub.dev/google/universal-sentence-encoder/4"
def get_lstm_model():
embedding_layer = hub.KerasLayer(module_url, trainable=True)
input = Input(shape=(n_sentences,), dtype=tf.string)
x = [Reshape((1,512))(embedding_layer(input[:, s])) for s in range(n_sentences)]
x = Concatenate(axis=1)(x)
x = LSTM(128, return_sequences=False)(x)
output = Dense(1, activation="sigmoid")(x)
model = Model(inputs=input, outputs=output)
model.compile("adam", "binary_crossentropy")
model.summary()
return model
推理时:
sentences = [str(i) for i in range(n_sentences)]
X = [sentences] # 1 sample
print(model.predict(X).shape)
X = [sentences, sentences[::-1]] # 2 samples
print(model.predict(X).shape)
Here 运行 笔记本