RuntimeError: CUDA error: device-side assert triggered - BART model
RuntimeError: CUDA error: device-side assert triggered - BART model
我正在尝试 运行 用于文本生成任务的 BART 语言模型。
当我用于另一个编码器-解码器模型 (T5) 时,我的代码工作正常,但对于 bart,我收到此错误:
File "train_bart.py", line 89, in train
outputs = model(input_ids = ids, attention_mask = mask, decoder_input_ids=y_ids, labels=lm_labels) cs-lab-host1" 12:39 10-Aug-21
File ".../venv/tf_23/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File ".../venv/tf_23/lib/python3.6/site-packages/transformers/models/bart/modeling_bart.py", line 1308, in forward
return_dict=return_dict,
File ".../venv/tf_23/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File ".../venv/tf_23/lib/python3.6/site-packages/transformers/models/bart/modeling_bart.py", line 1196, in forward
return_dict=return_dict,
File ".../venv/tf_23/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File ".../venv/tf_23/lib/python3.6/site-packages/transformers/models/bart/modeling_bart.py", line 985, in forward
attention_mask, input_shape, inputs_embeds, past_key_values_length
File ".../venv/tf_23/lib/python3.6/site-packages/transformers/models/bart/modeling_bart.py", line 866, in _prepare_decoder_attent
ion_mask
).to(self.device)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
这就是错误发生的地方:
for _, data in tqdm(enumerate(loader, 0), total=len(loader), desc='Processing batches..'):
y = data['target_ids'].to(device, dtype = torch.long)
y_ids = y[:, :-1].contiguous()
lm_labels = y[:, 1:].clone().detach()
lm_labels[y[:, 1:] == tokenizer.pad_token_id] = -100
ids = data['source_ids'].to(device, dtype = torch.long)
mask = data['source_mask'].to(device, dtype = torch.long)
outputs = model(input_ids = ids, attention_mask = mask, decoder_input_ids=y_ids, labels=lm_labels)
loss = outputs[0]
loader
是标记化处理后的数据。
我建议您暂时将批量大小更改为 1 并 运行 CPU 中的代码以获得更具描述性的回溯错误。
这会告诉你错误在哪里。
萨萨克
折腾了好几个小时,发现错误是因为Bart tokenizer添加了新的token。因此我需要调整模型输入嵌入矩阵的大小:
model.resize_token_embeddings(len(tokenizer))
我仍然不清楚的一点是,在不调整嵌入矩阵大小的情况下,我能够毫无问题地微调 T5 模型,但 Bart 不行。
也许这是因为 Bart 在输入层和输出层之间共享权重(我也不确定)。
我正在尝试 运行 用于文本生成任务的 BART 语言模型。
当我用于另一个编码器-解码器模型 (T5) 时,我的代码工作正常,但对于 bart,我收到此错误:
File "train_bart.py", line 89, in train
outputs = model(input_ids = ids, attention_mask = mask, decoder_input_ids=y_ids, labels=lm_labels) cs-lab-host1" 12:39 10-Aug-21
File ".../venv/tf_23/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File ".../venv/tf_23/lib/python3.6/site-packages/transformers/models/bart/modeling_bart.py", line 1308, in forward
return_dict=return_dict,
File ".../venv/tf_23/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File ".../venv/tf_23/lib/python3.6/site-packages/transformers/models/bart/modeling_bart.py", line 1196, in forward
return_dict=return_dict,
File ".../venv/tf_23/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File ".../venv/tf_23/lib/python3.6/site-packages/transformers/models/bart/modeling_bart.py", line 985, in forward
attention_mask, input_shape, inputs_embeds, past_key_values_length
File ".../venv/tf_23/lib/python3.6/site-packages/transformers/models/bart/modeling_bart.py", line 866, in _prepare_decoder_attent
ion_mask
).to(self.device)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
这就是错误发生的地方:
for _, data in tqdm(enumerate(loader, 0), total=len(loader), desc='Processing batches..'):
y = data['target_ids'].to(device, dtype = torch.long)
y_ids = y[:, :-1].contiguous()
lm_labels = y[:, 1:].clone().detach()
lm_labels[y[:, 1:] == tokenizer.pad_token_id] = -100
ids = data['source_ids'].to(device, dtype = torch.long)
mask = data['source_mask'].to(device, dtype = torch.long)
outputs = model(input_ids = ids, attention_mask = mask, decoder_input_ids=y_ids, labels=lm_labels)
loss = outputs[0]
loader
是标记化处理后的数据。
我建议您暂时将批量大小更改为 1 并 运行 CPU 中的代码以获得更具描述性的回溯错误。
这会告诉你错误在哪里。
萨萨克
折腾了好几个小时,发现错误是因为Bart tokenizer添加了新的token。因此我需要调整模型输入嵌入矩阵的大小:
model.resize_token_embeddings(len(tokenizer))
我仍然不清楚的一点是,在不调整嵌入矩阵大小的情况下,我能够毫无问题地微调 T5 模型,但 Bart 不行。
也许这是因为 Bart 在输入层和输出层之间共享权重(我也不确定)。