Albert_base:使用 bert-for-tf2 调用时,来自 ckpt 的权重未正确加载
Albert_base : weights from ckpt not loaded properly when calling with bert-for-tf2
我想通过进一步的 mlm 任务微调 Albert_base,但我意识到没有为 albert-base 提供预训练的 ckpt 文件。所以我的计划是自己将 saved_model(或从 tf-hub 加载的模型)转换为检查点,然后使用提供的代码 (https://github.com/google-research/ALBERT/blob/master/run_pretraining.py).
预训练 albert-base
在进一步预训练之前,为了检查到 ckpt 的转换是否成功,我将 ckpt 文件重新转换为 saved_model 格式,并使用 bert-for-tf2 (https://github.com/kpe/bert-for-tf2/tree/master/bert)
但是,当我加载重新转换的 albert_base 时,它的嵌入与从原始 albert_base 加载的嵌入不同。
下面是我如何将原始 saved_model 转换为 ckpt,然后再转换回 saved_model。 (我在 colab 上使用了 tf version = 1.15.0)
"""
Convert tf-hub module to checkpoint files.
"""
albert_module = hub.Module(
"https://tfhub.dev/google/albert_base/2",
trainable=True)
saver = tf.train.Saver()
sess = tf.Session()
sess.run(tf.global_variables_initializer())
saver.save(sess, './albert/model_ckpt/albert_base')
"""
Save model loaded from ckpt in saved_model format.
"""
from tensorflow.python.saved_model import tag_constants
graph = tf.Graph()
with tf.Session(graph=graph) as sess:
# Restore from checkpoint
loader = tf.train.import_meta_graph('./albert/model_ckpt/albert_base.meta')
loader.restore(sess, tf.train.latest_checkpoint('./albert/model_ckpt/'))
# Export checkpoint to SavedModel
builder = tf.saved_model.builder.SavedModelBuilder('./albert/saved_model')
builder.add_meta_graph_and_variables(sess,
[],
strip_default_attrs=True)
builder.save()
使用 bert-for-tf2,我加载 albert_base 作为 keras 层并构建一个简单的模块:
def load_pretrained_albert():
model_name = "albert_base"
model_dir = bert.fetch_tfhub_albert_model(model_name, ".models")
model_params = bert.albert_params(model_name)
l_bert = bert.BertModelLayer.from_params(model_params, name="albert")
# use in Keras Model here, and call model.build()
max_seq_len = 128
l_input_ids = Input(shape=(max_seq_len,), dtype='int32', name="l_input_ids")
output = l_bert(l_input_ids) # output: [batch_size, max_seq_len, hidden_size]
pooled_output = AveragePooling1D(pool_size=max_seq_len, data_format="channels_last")(output)
pooled_output = Flatten()(pooled_output)
model = Model(inputs=[l_input_ids], outputs=[pooled_output])
model.build(input_shape=(None, max_seq_len))
bert.load_albert_weights(l_bert, model_dir)
return model
上面的代码从 saved_model 加载权重。问题是,当我用我从检查点重新转换的那个覆盖 albert_base 的原始 saved_model 时,生成的嵌入不同。
当我 运行 上面的代码重新转换 saved_model 时,会出现以下警告:
model = load_pretrained_albert()
Fetching ALBERT model: albert_base version: 2
Already fetched: albert_base.tar.gz
already unpacked at: .models\albert_base
loader: No value for:[albert_4/embeddings/word_embeddings/embeddings:0], i.e.:[bert/embeddings/word_embeddings] in:[.models\albert_base]
loader: No value for:[albert_4/embeddings/word_embeddings_projector/projector:0], i.e.:[bert/encoder/embedding_hidden_mapping_in/kernel] in:[.models\albert_base]
loader: No value for:[albert_4/embeddings/word_embeddings_projector/bias:0], i.e.:[bert/encoder/embedding_hidden_mapping_in/bias] in:[.models\albert_base]
loader: No value for:[albert_4/embeddings/position_embeddings/embeddings:0], i.e.:[bert/embeddings/position_embeddings] in:[.models\albert_base]
loader: No value for:[albert_4/embeddings/LayerNorm/gamma:0], i.e.:[bert/embeddings/LayerNorm/gamma] in:[.models\albert_base]
loader: No value for:[albert_4/embeddings/LayerNorm/beta:0], i.e.:[bert/embeddings/LayerNorm/beta] in:[.models\albert_base]
loader: No value for:[albert_4/encoder/layer_shared/attention/self/query/kernel:0], i.e.:[bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel] in:[.models\albert_base]
loader: No value for:[albert_4/encoder/layer_shared/attention/self/query/bias:0], i.e.:[bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias] in:[.models\albert_base]
loader: No value for:[albert_4/encoder/layer_shared/attention/self/key/kernel:0], i.e.:[bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel] in:[.models\albert_base]
loader: No value for:[albert_4/encoder/layer_shared/attention/self/key/bias:0], i.e.:[bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias] in:[.models\albert_base]
loader: No value for:[albert_4/encoder/layer_shared/attention/self/value/kernel:0], i.e.:[bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel] in:[.models\albert_base]
loader: No value for:[albert_4/encoder/layer_shared/attention/self/value/bias:0], i.e.:[bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias] in:[.models\albert_base]
loader: No value for:[albert_4/encoder/layer_shared/attention/output/dense/kernel:0], i.e.:[bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel] in:[.models\albert_base]
loader: No value for:[albert_4/encoder/layer_shared/attention/output/dense/bias:0], i.e.:[bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias] in:[.models\albert_base]
loader: No value for:[albert_4/encoder/layer_shared/attention/output/LayerNorm/gamma:0], i.e.:[bert/encoder/transformer/group_0/inner_group_0/LayerNorm/gamma] in:[.models\albert_base]
loader: No value for:[albert_4/encoder/layer_shared/attention/output/LayerNorm/beta:0], i.e.:[bert/encoder/transformer/group_0/inner_group_0/LayerNorm/beta] in:[.models\albert_base]
loader: No value for:[albert_4/encoder/layer_shared/intermediate/kernel:0], i.e.:[bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel] in:[.models\albert_base]
loader: No value for:[albert_4/encoder/layer_shared/intermediate/bias:0], i.e.:[bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias] in:[.models\albert_base]
loader: No value for:[albert_4/encoder/layer_shared/output/dense/kernel:0], i.e.:[bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel] in:[.models\albert_base]
loader: No value for:[albert_4/encoder/layer_shared/output/dense/bias:0], i.e.:[bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias] in:[.models\albert_base]
loader: No value for:[albert_4/encoder/layer_shared/output/LayerNorm/gamma:0], i.e.:[bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/gamma] in:[.models\albert_base]
loader: No value for:[albert_4/encoder/layer_shared/output/LayerNorm/beta:0], i.e.:[bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/beta] in:[.models\albert_base]
Done loading 0 BERT weights from: .models\albert_base into <bert.model.BertModelLayer object at 0x0000029687449D68> (prefix:albert_4). Count of weights not found in the checkpoint was: [22]. Count of weights with mismatched shape: [0]
Unused weights from saved model:
module/bert/embeddings/LayerNorm/beta
module/bert/embeddings/LayerNorm/gamma
module/bert/embeddings/position_embeddings
module/bert/embeddings/token_type_embeddings
module/bert/embeddings/word_embeddings
module/bert/encoder/embedding_hidden_mapping_in/bias
module/bert/encoder/embedding_hidden_mapping_in/kernel
module/bert/encoder/transformer/group_0/inner_group_0/LayerNorm/beta
module/bert/encoder/transformer/group_0/inner_group_0/LayerNorm/gamma
module/bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/beta
module/bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/gamma
module/bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias
module/bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel
module/bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias
module/bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel
module/bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias
module/bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel
module/bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias
module/bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel
module/bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias
module/bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel
module/bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias
module/bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel
module/bert/pooler/dense/bias
module/bert/pooler/dense/kernel
module/cls/predictions/output_bias
module/cls/predictions/transform/LayerNorm/beta
module/cls/predictions/transform/LayerNorm/gamma
module/cls/predictions/transform/dense/bias
module/cls/predictions/transform/dense/kernel
而当 运行 与原始 albert_base 时,警告如下:
model = load_pretrained_albert()
Fetching ALBERT model: albert_base version: 2
Already fetched: albert_base.tar.gz
already unpacked at: .models\albert_base
Done loading 22 BERT weights from: .models\albert_base into <bert.model.BertModelLayer object at 0x0000029680196320> (prefix:albert_5). Count of weights not found in the checkpoint was: [0]. Count of weights with mismatched shape: [0]
Unused weights from saved model:
bert/embeddings/token_type_embeddings
bert/pooler/dense/bias
bert/pooler/dense/kernel
cls/predictions/output_bias
cls/predictions/transform/LayerNorm/beta
cls/predictions/transform/LayerNorm/gamma
cls/predictions/transform/dense/bias
cls/predictions/transform/dense/kernel
据我了解,由于名称不同,权重加载不正确。
有没有一种方法可以指定以 ckpt 格式保存时要保存的名称?我觉得如果,例如,当保存为 ckpt 格式时,将权重 'module/bert/embeddings/LayerNorm/beta' 保存为 'bert/embeddings/LayerNorm/beta',问题就会解决。我怎样才能去掉 'module/' 部分?
我觉得我可能让问题听起来更复杂了,但我尽量详细地解释了我所处的情况,以防万一:)
问题解决了!所以问题实际上是张量名称的差异。
所以我使用以下代码 (https://gist.github.com/batzner/7c24802dd9c5e15870b4b56e22135c96).
更改了检查点中张量的名称
只需要将'module/bert/....'改成'bert/....'就可以了
我想通过进一步的 mlm 任务微调 Albert_base,但我意识到没有为 albert-base 提供预训练的 ckpt 文件。所以我的计划是自己将 saved_model(或从 tf-hub 加载的模型)转换为检查点,然后使用提供的代码 (https://github.com/google-research/ALBERT/blob/master/run_pretraining.py).
预训练 albert-base在进一步预训练之前,为了检查到 ckpt 的转换是否成功,我将 ckpt 文件重新转换为 saved_model 格式,并使用 bert-for-tf2 (https://github.com/kpe/bert-for-tf2/tree/master/bert) 但是,当我加载重新转换的 albert_base 时,它的嵌入与从原始 albert_base 加载的嵌入不同。
下面是我如何将原始 saved_model 转换为 ckpt,然后再转换回 saved_model。 (我在 colab 上使用了 tf version = 1.15.0)
"""
Convert tf-hub module to checkpoint files.
"""
albert_module = hub.Module(
"https://tfhub.dev/google/albert_base/2",
trainable=True)
saver = tf.train.Saver()
sess = tf.Session()
sess.run(tf.global_variables_initializer())
saver.save(sess, './albert/model_ckpt/albert_base')
"""
Save model loaded from ckpt in saved_model format.
"""
from tensorflow.python.saved_model import tag_constants
graph = tf.Graph()
with tf.Session(graph=graph) as sess:
# Restore from checkpoint
loader = tf.train.import_meta_graph('./albert/model_ckpt/albert_base.meta')
loader.restore(sess, tf.train.latest_checkpoint('./albert/model_ckpt/'))
# Export checkpoint to SavedModel
builder = tf.saved_model.builder.SavedModelBuilder('./albert/saved_model')
builder.add_meta_graph_and_variables(sess,
[],
strip_default_attrs=True)
builder.save()
使用 bert-for-tf2,我加载 albert_base 作为 keras 层并构建一个简单的模块:
def load_pretrained_albert():
model_name = "albert_base"
model_dir = bert.fetch_tfhub_albert_model(model_name, ".models")
model_params = bert.albert_params(model_name)
l_bert = bert.BertModelLayer.from_params(model_params, name="albert")
# use in Keras Model here, and call model.build()
max_seq_len = 128
l_input_ids = Input(shape=(max_seq_len,), dtype='int32', name="l_input_ids")
output = l_bert(l_input_ids) # output: [batch_size, max_seq_len, hidden_size]
pooled_output = AveragePooling1D(pool_size=max_seq_len, data_format="channels_last")(output)
pooled_output = Flatten()(pooled_output)
model = Model(inputs=[l_input_ids], outputs=[pooled_output])
model.build(input_shape=(None, max_seq_len))
bert.load_albert_weights(l_bert, model_dir)
return model
上面的代码从 saved_model 加载权重。问题是,当我用我从检查点重新转换的那个覆盖 albert_base 的原始 saved_model 时,生成的嵌入不同。
当我 运行 上面的代码重新转换 saved_model 时,会出现以下警告:
model = load_pretrained_albert()
Fetching ALBERT model: albert_base version: 2
Already fetched: albert_base.tar.gz
already unpacked at: .models\albert_base
loader: No value for:[albert_4/embeddings/word_embeddings/embeddings:0], i.e.:[bert/embeddings/word_embeddings] in:[.models\albert_base]
loader: No value for:[albert_4/embeddings/word_embeddings_projector/projector:0], i.e.:[bert/encoder/embedding_hidden_mapping_in/kernel] in:[.models\albert_base]
loader: No value for:[albert_4/embeddings/word_embeddings_projector/bias:0], i.e.:[bert/encoder/embedding_hidden_mapping_in/bias] in:[.models\albert_base]
loader: No value for:[albert_4/embeddings/position_embeddings/embeddings:0], i.e.:[bert/embeddings/position_embeddings] in:[.models\albert_base]
loader: No value for:[albert_4/embeddings/LayerNorm/gamma:0], i.e.:[bert/embeddings/LayerNorm/gamma] in:[.models\albert_base]
loader: No value for:[albert_4/embeddings/LayerNorm/beta:0], i.e.:[bert/embeddings/LayerNorm/beta] in:[.models\albert_base]
loader: No value for:[albert_4/encoder/layer_shared/attention/self/query/kernel:0], i.e.:[bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel] in:[.models\albert_base]
loader: No value for:[albert_4/encoder/layer_shared/attention/self/query/bias:0], i.e.:[bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias] in:[.models\albert_base]
loader: No value for:[albert_4/encoder/layer_shared/attention/self/key/kernel:0], i.e.:[bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel] in:[.models\albert_base]
loader: No value for:[albert_4/encoder/layer_shared/attention/self/key/bias:0], i.e.:[bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias] in:[.models\albert_base]
loader: No value for:[albert_4/encoder/layer_shared/attention/self/value/kernel:0], i.e.:[bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel] in:[.models\albert_base]
loader: No value for:[albert_4/encoder/layer_shared/attention/self/value/bias:0], i.e.:[bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias] in:[.models\albert_base]
loader: No value for:[albert_4/encoder/layer_shared/attention/output/dense/kernel:0], i.e.:[bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel] in:[.models\albert_base]
loader: No value for:[albert_4/encoder/layer_shared/attention/output/dense/bias:0], i.e.:[bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias] in:[.models\albert_base]
loader: No value for:[albert_4/encoder/layer_shared/attention/output/LayerNorm/gamma:0], i.e.:[bert/encoder/transformer/group_0/inner_group_0/LayerNorm/gamma] in:[.models\albert_base]
loader: No value for:[albert_4/encoder/layer_shared/attention/output/LayerNorm/beta:0], i.e.:[bert/encoder/transformer/group_0/inner_group_0/LayerNorm/beta] in:[.models\albert_base]
loader: No value for:[albert_4/encoder/layer_shared/intermediate/kernel:0], i.e.:[bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel] in:[.models\albert_base]
loader: No value for:[albert_4/encoder/layer_shared/intermediate/bias:0], i.e.:[bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias] in:[.models\albert_base]
loader: No value for:[albert_4/encoder/layer_shared/output/dense/kernel:0], i.e.:[bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel] in:[.models\albert_base]
loader: No value for:[albert_4/encoder/layer_shared/output/dense/bias:0], i.e.:[bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias] in:[.models\albert_base]
loader: No value for:[albert_4/encoder/layer_shared/output/LayerNorm/gamma:0], i.e.:[bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/gamma] in:[.models\albert_base]
loader: No value for:[albert_4/encoder/layer_shared/output/LayerNorm/beta:0], i.e.:[bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/beta] in:[.models\albert_base]
Done loading 0 BERT weights from: .models\albert_base into <bert.model.BertModelLayer object at 0x0000029687449D68> (prefix:albert_4). Count of weights not found in the checkpoint was: [22]. Count of weights with mismatched shape: [0]
Unused weights from saved model:
module/bert/embeddings/LayerNorm/beta
module/bert/embeddings/LayerNorm/gamma
module/bert/embeddings/position_embeddings
module/bert/embeddings/token_type_embeddings
module/bert/embeddings/word_embeddings
module/bert/encoder/embedding_hidden_mapping_in/bias
module/bert/encoder/embedding_hidden_mapping_in/kernel
module/bert/encoder/transformer/group_0/inner_group_0/LayerNorm/beta
module/bert/encoder/transformer/group_0/inner_group_0/LayerNorm/gamma
module/bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/beta
module/bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/gamma
module/bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias
module/bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel
module/bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias
module/bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel
module/bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias
module/bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel
module/bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias
module/bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel
module/bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias
module/bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel
module/bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias
module/bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel
module/bert/pooler/dense/bias
module/bert/pooler/dense/kernel
module/cls/predictions/output_bias
module/cls/predictions/transform/LayerNorm/beta
module/cls/predictions/transform/LayerNorm/gamma
module/cls/predictions/transform/dense/bias
module/cls/predictions/transform/dense/kernel
而当 运行 与原始 albert_base 时,警告如下:
model = load_pretrained_albert()
Fetching ALBERT model: albert_base version: 2
Already fetched: albert_base.tar.gz
already unpacked at: .models\albert_base
Done loading 22 BERT weights from: .models\albert_base into <bert.model.BertModelLayer object at 0x0000029680196320> (prefix:albert_5). Count of weights not found in the checkpoint was: [0]. Count of weights with mismatched shape: [0]
Unused weights from saved model:
bert/embeddings/token_type_embeddings
bert/pooler/dense/bias
bert/pooler/dense/kernel
cls/predictions/output_bias
cls/predictions/transform/LayerNorm/beta
cls/predictions/transform/LayerNorm/gamma
cls/predictions/transform/dense/bias
cls/predictions/transform/dense/kernel
据我了解,由于名称不同,权重加载不正确。 有没有一种方法可以指定以 ckpt 格式保存时要保存的名称?我觉得如果,例如,当保存为 ckpt 格式时,将权重 'module/bert/embeddings/LayerNorm/beta' 保存为 'bert/embeddings/LayerNorm/beta',问题就会解决。我怎样才能去掉 'module/' 部分?
我觉得我可能让问题听起来更复杂了,但我尽量详细地解释了我所处的情况,以防万一:)
问题解决了!所以问题实际上是张量名称的差异。 所以我使用以下代码 (https://gist.github.com/batzner/7c24802dd9c5e15870b4b56e22135c96).
更改了检查点中张量的名称只需要将'module/bert/....'改成'bert/....'就可以了