RuntimeError: Unknown device when trying to run AlbertForMaskedLM on colab tpu
RuntimeError: Unknown device when trying to run AlbertForMaskedLM on colab tpu
我运行在 colab 上使用以下示例代码:https://huggingface.co/transformers/model_doc/albert.html#albertformaskedlm
import os
import torch
import torch_xla
import torch_xla.core.xla_model as xm
assert os.environ['COLAB_TPU_ADDR']
dev = xm.xla_device()
from transformers import AlbertTokenizer, AlbertForMaskedLM
import torch
tokenizer = AlbertTokenizer.from_pretrained('albert-base-v2')
model = AlbertForMaskedLM.from_pretrained('albert-base-v2').to(dev)
input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0) # Batch size 1
data = input_ids.to(dev)
outputs = model(data, masked_lm_labels=data)
loss, prediction_scores = outputs[:2]
除了使用 .to(dev)
将 input_ids
和 model
移动到 TPU 设备上之外,我没有对示例代码进行任何操作。似乎所有内容都已移至 TPU 没问题,因为当我输入 data
时,我得到以下输出:tensor([[ 2, 10975, 15, 51, 1952, 25, 10901, 3]], device='xla:1')
然而,当我 运行 这段代码时,我得到以下错误:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-5-f756487db8f7> in <module>()
1
----> 2 outputs = model(data, masked_lm_labels=data)
3 loss, prediction_scores = outputs[:2]
9 frames
/usr/local/lib/python3.6/dist-packages/transformers/modeling_albert.py in forward(self, hidden_states, attention_mask, head_mask)
277 attention_output = self.attention(hidden_states, attention_mask, head_mask)
278 ffn_output = self.ffn(attention_output[0])
--> 279 ffn_output = self.activation(ffn_output)
280 ffn_output = self.ffn_output(ffn_output)
281 hidden_states = self.full_layer_layer_norm(ffn_output + attention_output[0])
RuntimeError: Unknown device
有人知道发生了什么事吗?
解决方案在这里:https://github.com/pytorch/xla/issues/1909
在调用model.to(dev)
之前,您需要调用xm.send_cpu_data_to_device(model, xm.xla_device())
:
model = AlbertForMaskedLM.from_pretrained('albert-base-v2')
model = xm.send_cpu_data_to_device(model, dev)
model = model.to(dev)
在获取 ALBERT 用于在 TPU 上工作的 gelu 激活函数时也存在一些问题,因此在 TPU 上工作时需要使用以下 transformers 分支:https://github.com/huggingface/transformers/tree/fix-jit-tpu
查看以下 colab notebook(来自 https://github.com/jysohn23) for full solution: https://colab.research.google.com/gist/jysohn23/68d620cda395eab66289115169f43900/getting-started-with-pytorch-on-cloud-tpus.ipynb
我运行在 colab 上使用以下示例代码:https://huggingface.co/transformers/model_doc/albert.html#albertformaskedlm
import os
import torch
import torch_xla
import torch_xla.core.xla_model as xm
assert os.environ['COLAB_TPU_ADDR']
dev = xm.xla_device()
from transformers import AlbertTokenizer, AlbertForMaskedLM
import torch
tokenizer = AlbertTokenizer.from_pretrained('albert-base-v2')
model = AlbertForMaskedLM.from_pretrained('albert-base-v2').to(dev)
input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0) # Batch size 1
data = input_ids.to(dev)
outputs = model(data, masked_lm_labels=data)
loss, prediction_scores = outputs[:2]
除了使用 .to(dev)
将 input_ids
和 model
移动到 TPU 设备上之外,我没有对示例代码进行任何操作。似乎所有内容都已移至 TPU 没问题,因为当我输入 data
时,我得到以下输出:tensor([[ 2, 10975, 15, 51, 1952, 25, 10901, 3]], device='xla:1')
然而,当我 运行 这段代码时,我得到以下错误:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-5-f756487db8f7> in <module>()
1
----> 2 outputs = model(data, masked_lm_labels=data)
3 loss, prediction_scores = outputs[:2]
9 frames
/usr/local/lib/python3.6/dist-packages/transformers/modeling_albert.py in forward(self, hidden_states, attention_mask, head_mask)
277 attention_output = self.attention(hidden_states, attention_mask, head_mask)
278 ffn_output = self.ffn(attention_output[0])
--> 279 ffn_output = self.activation(ffn_output)
280 ffn_output = self.ffn_output(ffn_output)
281 hidden_states = self.full_layer_layer_norm(ffn_output + attention_output[0])
RuntimeError: Unknown device
有人知道发生了什么事吗?
解决方案在这里:https://github.com/pytorch/xla/issues/1909
在调用model.to(dev)
之前,您需要调用xm.send_cpu_data_to_device(model, xm.xla_device())
:
model = AlbertForMaskedLM.from_pretrained('albert-base-v2')
model = xm.send_cpu_data_to_device(model, dev)
model = model.to(dev)
在获取 ALBERT 用于在 TPU 上工作的 gelu 激活函数时也存在一些问题,因此在 TPU 上工作时需要使用以下 transformers 分支:https://github.com/huggingface/transformers/tree/fix-jit-tpu
查看以下 colab notebook(来自 https://github.com/jysohn23) for full solution: https://colab.research.google.com/gist/jysohn23/68d620cda395eab66289115169f43900/getting-started-with-pytorch-on-cloud-tpus.ipynb