`logits` 和 `labels` 必须具有相同的形状,在使用转换器时收到 ((None, 512, 768) vs (None, 1))
`logits` and `labels` must have the same shape, received ((None, 512, 768) vs (None, 1)) when using transformers
我在尝试微调 Bert 模型以预测情绪分析时遇到下一个错误。
我正在使用作为输入:
X-包含推文的字符串列表
y-一个数字列表(0 - 负数,1 - 正数)
我正在尝试微调一个 bert 模型来预测情绪分析,但是当我尝试拟合模型时,我总是在 logits 和标签中遇到同样的错误。我加载了一个预训练模型,然后构建了数据集,但是当我尝试拟合它时,这是不可能的。
用作输入的文本是由推文组成的字符串列表,用作输入的标签是类别列表(负面和正面),但已转换为 0 和 1。
from sklearn.preprocessing import MultiLabelBinarizer
#LOAD MODEL
hugging_face_model = 'distilbert-base-uncased-finetuned-sst-2-english'
batches = 32
epochs = 1
tokenizer = BertTokenizer.from_pretrained(hugging_face_model)
model = TFBertModel.from_pretrained(hugging_face_model, num_labels=2)
#PREPARE THE DATASET
#create a list of strings (tweets)
lst = list(X_train_lower['lower_text'].values)
encoded_input = tokenizer(lst, truncation=True, padding=True, return_tensors='tf')
y_train['sentimentNumber'] = y_train['sentiment'].replace({'negative': 0, 'positive': 1})
label_list = list(y_train['sentimentNumber'].values)
#CREATE DATASET
train_dataset = tf.data.Dataset.from_tensor_slices((dict(encoded_input), label_list))
#COMPILE AND FIT THE MODEL
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=5e-5), loss=BinaryCrossentropy(from_logits=True),metrics=["accuracy"])
model.fit(train_dataset.shuffle(len(df)).batch(batches),epochs=epochs,batch_size=batches) ```
ValueError Traceback (most recent call last)
<ipython-input-158-e5b63f982311> in <module>()
----> 1 model.fit(train_dataset.shuffle(len(df)).batch(batches),epochs=epochs,batch_size=batches)
1 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py in autograph_handler(*args, **kwargs)
1145 except Exception as e: # pylint:disable=broad-except
1146 if hasattr(e, "ag_error_metadata"):
-> 1147 raise e.ag_error_metadata.to_exception(e)
1148 else:
1149 raise
ValueError: in user code:
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1021, in train_function *
return step_function(self, iterator)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1010, in step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1000, in run_step **
outputs = model.train_step(data)
File "/usr/local/lib/python3.7/dist-packages/transformers/modeling_tf_utils.py", line 1000, in train_step
loss = self.compiled_loss(y, y_pred, sample_weight, regularization_losses=self.losses)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/compile_utils.py", line 201, in __call__
loss_value = loss_obj(y_t, y_p, sample_weight=sw)
File "/usr/local/lib/python3.7/dist-packages/keras/losses.py", line 141, in __call__
losses = call_fn(y_true, y_pred)
File "/usr/local/lib/python3.7/dist-packages/keras/losses.py", line 245, in call **
return ag_fn(y_true, y_pred, **self._fn_kwargs)
File "/usr/local/lib/python3.7/dist-packages/keras/losses.py", line 1932, in binary_crossentropy
backend.binary_crossentropy(y_true, y_pred, from_logits=from_logits),
File "/usr/local/lib/python3.7/dist-packages/keras/backend.py", line 5247, in binary_crossentropy
return tf.nn.sigmoid_cross_entropy_with_logits(labels=target, logits=output)
ValueError: `logits` and `labels` must have the same shape, received ((None, 512, 768) vs (None, 1)).
如本 kaggle notebook 中所述,您必须围绕 pre-trained BERT 模型构建自定义 Keras 模型以执行分类,
The bare Bert Model transformer outputing raw hidden-states without
any specific head on top
这是一段代码的副本:
def create_model(bert_model):
input_ids = tf.keras.Input(shape=(60,),dtype='int32')
attention_masks = tf.keras.Input(shape=(60,),dtype='int32')
output = bert_model([input_ids,attention_masks])
output = output[1]
output = tf.keras.layers.Dense(32,activation='relu')(output)
output = tf.keras.layers.Dropout(0.2)(output)
output = tf.keras.layers.Dense(1,activation='sigmoid')(output)
model = tf.keras.models.Model(inputs = [input_ids,attention_masks],outputs = output)
model.compile(Adam(lr=6e-6), loss='binary_crossentropy', metrics=['accuracy'])
return model
注意:您可能需要调整此代码,特别是修改输入形状(从错误消息看来,从 60 到 512,您的分词器最大长度)
加载 BERT 模型并构建分类器:
from transformers import TFBertModel
bert_model = TFBertModel.from_pretrained(hugging_face_model)
model = create_model(bert_model)
model.summary()
总结:
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 60)] 0 []
input_2 (InputLayer) [(None, 60)] 0 []
tf_bert_model_1 (TFBertModel) TFBaseModelOutputWi 109482240 ['input_1[0][0]',
thPoolingAndCrossAt 'input_2[0][0]']
tentions(last_hidde
n_state=(None, 60,
768),
pooler_output=(Non
e, 768),
past_key_values=No
ne, hidden_states=N
one, attentions=Non
e, cross_attentions
=None)
dense (Dense) (None, 32) 24608 ['tf_bert_model_1[0][1]']
dropout_74 (Dropout) (None, 32) 0 ['dense[0][0]']
dense_1 (Dense) (None, 1) 33 ['dropout_74[0][0]']
==================================================================================================
Total params: 109,506,881
Trainable params: 109,506,881
Non-trainable params: 0
我在尝试微调 Bert 模型以预测情绪分析时遇到下一个错误。
我正在使用作为输入: X-包含推文的字符串列表 y-一个数字列表(0 - 负数,1 - 正数)
我正在尝试微调一个 bert 模型来预测情绪分析,但是当我尝试拟合模型时,我总是在 logits 和标签中遇到同样的错误。我加载了一个预训练模型,然后构建了数据集,但是当我尝试拟合它时,这是不可能的。
用作输入的文本是由推文组成的字符串列表,用作输入的标签是类别列表(负面和正面),但已转换为 0 和 1。
from sklearn.preprocessing import MultiLabelBinarizer
#LOAD MODEL
hugging_face_model = 'distilbert-base-uncased-finetuned-sst-2-english'
batches = 32
epochs = 1
tokenizer = BertTokenizer.from_pretrained(hugging_face_model)
model = TFBertModel.from_pretrained(hugging_face_model, num_labels=2)
#PREPARE THE DATASET
#create a list of strings (tweets)
lst = list(X_train_lower['lower_text'].values)
encoded_input = tokenizer(lst, truncation=True, padding=True, return_tensors='tf')
y_train['sentimentNumber'] = y_train['sentiment'].replace({'negative': 0, 'positive': 1})
label_list = list(y_train['sentimentNumber'].values)
#CREATE DATASET
train_dataset = tf.data.Dataset.from_tensor_slices((dict(encoded_input), label_list))
#COMPILE AND FIT THE MODEL
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=5e-5), loss=BinaryCrossentropy(from_logits=True),metrics=["accuracy"])
model.fit(train_dataset.shuffle(len(df)).batch(batches),epochs=epochs,batch_size=batches) ```
ValueError Traceback (most recent call last)
<ipython-input-158-e5b63f982311> in <module>()
----> 1 model.fit(train_dataset.shuffle(len(df)).batch(batches),epochs=epochs,batch_size=batches)
1 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py in autograph_handler(*args, **kwargs)
1145 except Exception as e: # pylint:disable=broad-except
1146 if hasattr(e, "ag_error_metadata"):
-> 1147 raise e.ag_error_metadata.to_exception(e)
1148 else:
1149 raise
ValueError: in user code:
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1021, in train_function *
return step_function(self, iterator)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1010, in step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1000, in run_step **
outputs = model.train_step(data)
File "/usr/local/lib/python3.7/dist-packages/transformers/modeling_tf_utils.py", line 1000, in train_step
loss = self.compiled_loss(y, y_pred, sample_weight, regularization_losses=self.losses)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/compile_utils.py", line 201, in __call__
loss_value = loss_obj(y_t, y_p, sample_weight=sw)
File "/usr/local/lib/python3.7/dist-packages/keras/losses.py", line 141, in __call__
losses = call_fn(y_true, y_pred)
File "/usr/local/lib/python3.7/dist-packages/keras/losses.py", line 245, in call **
return ag_fn(y_true, y_pred, **self._fn_kwargs)
File "/usr/local/lib/python3.7/dist-packages/keras/losses.py", line 1932, in binary_crossentropy
backend.binary_crossentropy(y_true, y_pred, from_logits=from_logits),
File "/usr/local/lib/python3.7/dist-packages/keras/backend.py", line 5247, in binary_crossentropy
return tf.nn.sigmoid_cross_entropy_with_logits(labels=target, logits=output)
ValueError: `logits` and `labels` must have the same shape, received ((None, 512, 768) vs (None, 1)).
如本 kaggle notebook 中所述,您必须围绕 pre-trained BERT 模型构建自定义 Keras 模型以执行分类,
The bare Bert Model transformer outputing raw hidden-states without any specific head on top
这是一段代码的副本:
def create_model(bert_model):
input_ids = tf.keras.Input(shape=(60,),dtype='int32')
attention_masks = tf.keras.Input(shape=(60,),dtype='int32')
output = bert_model([input_ids,attention_masks])
output = output[1]
output = tf.keras.layers.Dense(32,activation='relu')(output)
output = tf.keras.layers.Dropout(0.2)(output)
output = tf.keras.layers.Dense(1,activation='sigmoid')(output)
model = tf.keras.models.Model(inputs = [input_ids,attention_masks],outputs = output)
model.compile(Adam(lr=6e-6), loss='binary_crossentropy', metrics=['accuracy'])
return model
注意:您可能需要调整此代码,特别是修改输入形状(从错误消息看来,从 60 到 512,您的分词器最大长度)
加载 BERT 模型并构建分类器:
from transformers import TFBertModel
bert_model = TFBertModel.from_pretrained(hugging_face_model)
model = create_model(bert_model)
model.summary()
总结:
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 60)] 0 []
input_2 (InputLayer) [(None, 60)] 0 []
tf_bert_model_1 (TFBertModel) TFBaseModelOutputWi 109482240 ['input_1[0][0]',
thPoolingAndCrossAt 'input_2[0][0]']
tentions(last_hidde
n_state=(None, 60,
768),
pooler_output=(Non
e, 768),
past_key_values=No
ne, hidden_states=N
one, attentions=Non
e, cross_attentions
=None)
dense (Dense) (None, 32) 24608 ['tf_bert_model_1[0][1]']
dropout_74 (Dropout) (None, 32) 0 ['dense[0][0]']
dense_1 (Dense) (None, 1) 33 ['dropout_74[0][0]']
==================================================================================================
Total params: 109,506,881
Trainable params: 109,506,881
Non-trainable params: 0