在预测期间使用来自 tensorflow hub 的 Elmo 作为自定义 tf.keras 层的问题
Problem using Elmo from tensorflow hub as custom tf.keras layer during prediction
我正在尝试将来自 tensorflow hub 的 Elmo 与 tf.keras 一起使用来执行 NER。训练很好,损失在减少,测试集也给出了很好的结果。但我无法预测,因为出现以下错误:
2019-05-02 15:41:42.785946: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
Traceback (most recent call last):
File "elmo_eva_brain.py", line 668, in <module>
np.array([['hello', 'world'] + ['--PAD--'] * 18])))
File "/home/ashwanipandey/eva_ml/experimental/eva_brain/venv/lib64/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1113, in predict
self, x, batch_size=batch_size, verbose=verbose, steps=steps)
File "/home/ashwanipandey/eva_ml/experimental/eva_brain/venv/lib64/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 329, in model_iteration
batch_outs = f(ins_batch)
File "/home/ashwanipandey/eva_ml/experimental/eva_brain/venv/lib64/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3076, in __call__
run_metadata=self.run_metadata)
File "/home/ashwanipandey/eva_ml/experimental/eva_brain/venv/lib64/python3.6/site-packages/tensorflow/python/client/session.py", line 1439, in __call__
run_metadata_ptr)
File "/home/ashwanipandey/eva_ml/experimental/eva_brain/venv/lib64/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: len(seq_lens) != input.dims(0), (256 vs. 1)
[[{{node Embed/elmo/elmo_module_apply_tokens/bilm/ReverseSequence}}]]
[[{{node Tag/t_output/transpose_1}}]]
256 是我训练时的批量大小。我试图预测一个句子。
我试着在网上搜索了很多,但都是徒劳的。任何帮助深表感谢。
如果我重复我的向量 256 次并在预测期间将 batch_size 设置为 256,我绝对可以获得预测。但正如您所见,这是一种非常低效的解决方法。
这是自定义图层的代码
class ElmoEmbeddingLayer(keras.layers.Layer):
def __init__(self, dimensions=1024, batch_size=512, word_size=20, **kwargs):
self.dimensions = 1024
self.trainable = True
self.batch_size = _BATCH_SIZE
self.word_size = _WORD_SIZE
super().__init__(**kwargs)
def build(self, input_shape):
self.elmo = hub.Module('https://tfhub.dev/google/elmo/2', trainable=self.trainable,
name=f"{self.name}_module")
super().build(input_shape)
def call(self, x, mask=None):
result = self.elmo(inputs={
"tokens": K.cast(x, tf.string),
"sequence_len": K.constant(self.batch_size*[self.word_size], dtype=tf.int32)
},
as_dict=True,
signature='tokens',
)['elmo']
return result
def compute_mask(self, inputs, mask=None):
return K.not_equal(inputs, '--PAD--')
def compute_output_shape(self, input_shape):
return (None, self.word_size, self.dimensions)
def get_config(self):
config = {
'dimensions': self.dimensions,
'trainable': self.trainable,
'batch_size': self.batch_size,
'word_size': self.word_size
}
base_config = super().get_config()
return dict(list(base_config.items()) + list(config.items()))
这是我的模型架构:
model architecture
样本数量(在训练和测试集中)必须能被 batch_size 整除。否则 keras 中的最后一批将破坏架构。
因此,例如,一种解决方案是使用 split_tr 之前的样本进行训练,使用 split_te 进行预测:
split_tr = (X_train.shape[0]//BATCH_SIZE)*BATCH_SIZE
split_te = (X_test.shape[0]//BATCH_SIZE)*BATCH_SIZE
model.fit(X_train_text[:split_tr], y_train[:split_tr], batch_size=BATCH_SIZE, epochs=15, validation_data=(X_test_text[:split_te], y_test[:split_te]), verbose=1)
我和你有同样的问题,在 RNN ELMo post-tagger 模型上工作。最后我按照解决方案批量预测并保留我想要的测试样本:
model.predict([X_test[:split_te]], batch_size=256)[0]
有关更多想法(例如复制权重),请查看 here!
我正在尝试将来自 tensorflow hub 的 Elmo 与 tf.keras 一起使用来执行 NER。训练很好,损失在减少,测试集也给出了很好的结果。但我无法预测,因为出现以下错误:
2019-05-02 15:41:42.785946: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
Traceback (most recent call last):
File "elmo_eva_brain.py", line 668, in <module>
np.array([['hello', 'world'] + ['--PAD--'] * 18])))
File "/home/ashwanipandey/eva_ml/experimental/eva_brain/venv/lib64/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1113, in predict
self, x, batch_size=batch_size, verbose=verbose, steps=steps)
File "/home/ashwanipandey/eva_ml/experimental/eva_brain/venv/lib64/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 329, in model_iteration
batch_outs = f(ins_batch)
File "/home/ashwanipandey/eva_ml/experimental/eva_brain/venv/lib64/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3076, in __call__
run_metadata=self.run_metadata)
File "/home/ashwanipandey/eva_ml/experimental/eva_brain/venv/lib64/python3.6/site-packages/tensorflow/python/client/session.py", line 1439, in __call__
run_metadata_ptr)
File "/home/ashwanipandey/eva_ml/experimental/eva_brain/venv/lib64/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: len(seq_lens) != input.dims(0), (256 vs. 1)
[[{{node Embed/elmo/elmo_module_apply_tokens/bilm/ReverseSequence}}]]
[[{{node Tag/t_output/transpose_1}}]]
256 是我训练时的批量大小。我试图预测一个句子。
我试着在网上搜索了很多,但都是徒劳的。任何帮助深表感谢。 如果我重复我的向量 256 次并在预测期间将 batch_size 设置为 256,我绝对可以获得预测。但正如您所见,这是一种非常低效的解决方法。
这是自定义图层的代码
class ElmoEmbeddingLayer(keras.layers.Layer):
def __init__(self, dimensions=1024, batch_size=512, word_size=20, **kwargs):
self.dimensions = 1024
self.trainable = True
self.batch_size = _BATCH_SIZE
self.word_size = _WORD_SIZE
super().__init__(**kwargs)
def build(self, input_shape):
self.elmo = hub.Module('https://tfhub.dev/google/elmo/2', trainable=self.trainable,
name=f"{self.name}_module")
super().build(input_shape)
def call(self, x, mask=None):
result = self.elmo(inputs={
"tokens": K.cast(x, tf.string),
"sequence_len": K.constant(self.batch_size*[self.word_size], dtype=tf.int32)
},
as_dict=True,
signature='tokens',
)['elmo']
return result
def compute_mask(self, inputs, mask=None):
return K.not_equal(inputs, '--PAD--')
def compute_output_shape(self, input_shape):
return (None, self.word_size, self.dimensions)
def get_config(self):
config = {
'dimensions': self.dimensions,
'trainable': self.trainable,
'batch_size': self.batch_size,
'word_size': self.word_size
}
base_config = super().get_config()
return dict(list(base_config.items()) + list(config.items()))
这是我的模型架构: model architecture
样本数量(在训练和测试集中)必须能被 batch_size 整除。否则 keras 中的最后一批将破坏架构。 因此,例如,一种解决方案是使用 split_tr 之前的样本进行训练,使用 split_te 进行预测:
split_tr = (X_train.shape[0]//BATCH_SIZE)*BATCH_SIZE
split_te = (X_test.shape[0]//BATCH_SIZE)*BATCH_SIZE
model.fit(X_train_text[:split_tr], y_train[:split_tr], batch_size=BATCH_SIZE, epochs=15, validation_data=(X_test_text[:split_te], y_test[:split_te]), verbose=1)
我和你有同样的问题,在 RNN ELMo post-tagger 模型上工作。最后我按照解决方案批量预测并保留我想要的测试样本:
model.predict([X_test[:split_te]], batch_size=256)[0]
有关更多想法(例如复制权重),请查看 here!