设备类型为 cuda 的预期对象,但在 Pytorch 中获得了设备类型 cpu
Expected object of device type cuda but got device type cpu in Pytorch
我有以下计算损失函数的代码:
class MSE_loss(nn.Module):
"""
: metric: L1, L2 norms or cosine similarity
: mode: training or evaluation mode
"""
def __init__(self,metric, mode, weighted_sum = False):
super(MSE_loss, self).__init__()
self.metric = metric.lower()
self.loss_function = nn.MSELoss()
self.mode = mode.lower()
self.weighted_sum = weighted_sum
def forward(self, output1, output2, labels):
self.labels = labels
self.linear = nn.Linear(output1.size()[0],1)
if self.metric == 'cos':
self.d= F.cosine_similarity(output1, output2)
elif self.metric == 'l1':
self.d = torch.abs(output1-output2)
elif self.metric == 'l2':
self.d = torch.sqrt((output1-output2)**2)
def dimensional_reduction(forward):
if self.weighted_sum:
distance = self.linear(self.d)
else:
distance = torch.mean(self.d,1)
return distance
def estimate_loss(forward):
distance = dimensional_reduction(self.d)
pred = torch.exp(-distance)
pred = torch.round(pred)
loss = self.loss_function(pred, self.labels)
return pred, loss
pred, loss = estimate_loss(self.d)
if self.mode == 'training':
return loss
else:
return pred, loss
给出
criterion = MSE_loss('l1','training', weighted_sum = True)
我想在执行标准时获得经过 self.linear 神经元后的距离。但是,我收到错误提示“设备类型为 cuda 的预期对象,但在调用 _th_addmm 时得到了参数 #1 'self' 的设备类型 cpu”,这表明出现了问题。我省略了代码的第一部分,但我提供了整个错误消息,以便您了解发生了什么。
RuntimeError Traceback (most recent call last)
<ipython-input-253-781ed4791260> in <module>()
7 criterion = MSE_loss('l1','training', weighted_sum = True)
8
----> 9 train(test_net, train_loader, 10, batch_size, optimiser, clip, criterion)
<ipython-input-207-02fecbfe3b1c> in train(SNN, dataloader, epochs, batch_size, optimiser, clip, criterion)
57
58 # calculate the loss and perform backprop
---> 59 loss = criterion(output1, output2, labels)
60 a = [[n,p, p.grad] for n,p in SNN.named_parameters()]
61
~/.conda/envs/dalkeCourse/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
539 result = self._slow_forward(*input, **kwargs)
540 else:
--> 541 result = self.forward(*input, **kwargs)
542 for hook in self._forward_hooks.values():
543 hook_result = hook(self, input, result)
<ipython-input-248-fb88b987ce71> in forward(self, output1, output2, labels)
49 return pred, loss
50
---> 51 pred, loss = estimate_loss(self.d)
52
53 if self.mode == 'training':
<ipython-input-248-fb88b987ce71> in estimate_loss(forward)
43
44 def estimate_loss(forward):
---> 45 distance = dimensional_reduction(self.d)
46 pred = torch.exp(-distance)
47 pred = torch.round(pred)
<ipython-input-248-fb88b987ce71> in dimensional_reduction(forward)
36 else:
37 if self.weighted_sum:
---> 38 self.d = self.linear(self.d)
39 else:
40 self.d = torch.mean(self.d,1)
~/.conda/envs/dalkeCourse/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
539 result = self._slow_forward(*input, **kwargs)
540 else:
--> 541 result = self.forward(*input, **kwargs)
542 for hook in self._forward_hooks.values():
543 hook_result = hook(self, input, result)
~/.conda/envs/dalkeCourse/lib/python3.6/site-packages/torch/nn/modules/linear.py in forward(self, input)
85
86 def forward(self, input):
---> 87 return F.linear(input, self.weight, self.bias)
88
89 def extra_repr(self):
~/.conda/envs/dalkeCourse/lib/python3.6/site-packages/torch/nn/functional.py in linear(input, weight, bias)
1368 if input.dim() == 2 and bias is not None:
1369 # fused op is marginally faster
-> 1370 ret = torch.addmm(bias, input, weight.t())
1371 else:
1372 output = input.matmul(weight.t())
RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_addmm
self.d虽然是张量,但是已经传入GPU,如下图:
self.d =
tensor([[3.7307e-04, 8.4476e-04, 4.0426e-04, ..., 4.2015e-04, 1.7830e-04,
1.2833e-04],
[3.9271e-04, 4.8325e-04, 9.5238e-04, ..., 1.5126e-04, 1.3420e-04,
3.9260e-04],
[1.9278e-04, 2.6530e-04, 8.6903e-04, ..., 1.6985e-05, 9.5103e-05,
1.9610e-04],
...,
[1.8257e-05, 3.1304e-04, 4.6398e-04, ..., 2.7327e-04, 1.1909e-04,
1.5069e-04],
[1.7577e-04, 3.4820e-05, 9.4168e-04, ..., 3.2848e-04, 2.2514e-04,
5.4275e-05],
[4.2916e-04, 1.6155e-04, 9.3186e-04, ..., 1.0950e-04, 2.5083e-04,
3.7374e-06]], device='cuda:0', grad_fn=<AbsBackward>)
在 MSE_loss
的 forward
中,您定义了一个 可能 仍在 CPU 中的线性层(您没有没有提供 MCVE,所以我只能假设):
self.linear = nn.Linear(output1.size()[0], 1)
如果您想尝试看看这是否是问题所在,您可以:
self.linear = nn.Linear(output1.size()[0], 1).cuda()
但是,如果 self.d
在 CPU 中,那么它将再次失败。要解决此问题,您可以通过执行以下操作将线性移动到 self.d
张量的同一设备:
def forward(self, output1, output2, labels):
self.labels = labels
self.linear = nn.Linear(output1.size()[0], 1)
if self.metric == 'cos':
self.d = F.cosine_similarity(output1, output2)
elif self.metric == 'l1':
self.d = torch.abs(output1-output2)
elif self.metric == 'l2':
self.d = torch.sqrt((output1-output2)**2)
# move self.linear to the correct device
self.linear = self.linear.to(self.d.device)
我在建模型的时候也遇到了同样的问题,最后发现是因为我重新训练了模型的全连接层,像这样:
net.to(device)
pre_trained_model=model_path
missing_keys,unexpected_keys=net.load_state_dict(torch.load(pre_trained_model),strict=False)
net.fc=nn.Linear(inchannel,CLASSES)
虽然模型是传输到 cuda 的,但重新定义的 fc 不是,所以最后一行应该是:
net.fc=nn.Linear(inchannel,CLASSES).to(device)
所以看看这种情况的发生是否有帮助。
作为补充或笼统的回答,每次遇到这个cuda
and cpu
unmatched error,首先要检查以下三点:
- 你是否把你的
model
放在cuda
上,换句话说,你是否有类似的代码:
model = nn.DataParallel(model, device_ids=None).cuda()
- 是否将
input data
放在 cuda
上,例如 input_data.cuda()
- 是否将
tensor
放在 cuda
上,例如:
loss_sum = torch.tensor([losses.sum], dtype=torch.float32, device=device)
emmm,做好这三项检查,说不定就能解决你的问题,祝你好运。
我也遇到了同样的问题,结果发现应该用
customized_block = nn.ModuleList([])
而不是
customized_block = []
定义模型时。
由于普通列表中的模块不会被识别为nn.Module
,因此在调用model.cuda()
时不会将其放在GPU上。
我有以下计算损失函数的代码:
class MSE_loss(nn.Module):
"""
: metric: L1, L2 norms or cosine similarity
: mode: training or evaluation mode
"""
def __init__(self,metric, mode, weighted_sum = False):
super(MSE_loss, self).__init__()
self.metric = metric.lower()
self.loss_function = nn.MSELoss()
self.mode = mode.lower()
self.weighted_sum = weighted_sum
def forward(self, output1, output2, labels):
self.labels = labels
self.linear = nn.Linear(output1.size()[0],1)
if self.metric == 'cos':
self.d= F.cosine_similarity(output1, output2)
elif self.metric == 'l1':
self.d = torch.abs(output1-output2)
elif self.metric == 'l2':
self.d = torch.sqrt((output1-output2)**2)
def dimensional_reduction(forward):
if self.weighted_sum:
distance = self.linear(self.d)
else:
distance = torch.mean(self.d,1)
return distance
def estimate_loss(forward):
distance = dimensional_reduction(self.d)
pred = torch.exp(-distance)
pred = torch.round(pred)
loss = self.loss_function(pred, self.labels)
return pred, loss
pred, loss = estimate_loss(self.d)
if self.mode == 'training':
return loss
else:
return pred, loss
给出
criterion = MSE_loss('l1','training', weighted_sum = True)
我想在执行标准时获得经过 self.linear 神经元后的距离。但是,我收到错误提示“设备类型为 cuda 的预期对象,但在调用 _th_addmm 时得到了参数 #1 'self' 的设备类型 cpu”,这表明出现了问题。我省略了代码的第一部分,但我提供了整个错误消息,以便您了解发生了什么。
RuntimeError Traceback (most recent call last)
<ipython-input-253-781ed4791260> in <module>()
7 criterion = MSE_loss('l1','training', weighted_sum = True)
8
----> 9 train(test_net, train_loader, 10, batch_size, optimiser, clip, criterion)
<ipython-input-207-02fecbfe3b1c> in train(SNN, dataloader, epochs, batch_size, optimiser, clip, criterion)
57
58 # calculate the loss and perform backprop
---> 59 loss = criterion(output1, output2, labels)
60 a = [[n,p, p.grad] for n,p in SNN.named_parameters()]
61
~/.conda/envs/dalkeCourse/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
539 result = self._slow_forward(*input, **kwargs)
540 else:
--> 541 result = self.forward(*input, **kwargs)
542 for hook in self._forward_hooks.values():
543 hook_result = hook(self, input, result)
<ipython-input-248-fb88b987ce71> in forward(self, output1, output2, labels)
49 return pred, loss
50
---> 51 pred, loss = estimate_loss(self.d)
52
53 if self.mode == 'training':
<ipython-input-248-fb88b987ce71> in estimate_loss(forward)
43
44 def estimate_loss(forward):
---> 45 distance = dimensional_reduction(self.d)
46 pred = torch.exp(-distance)
47 pred = torch.round(pred)
<ipython-input-248-fb88b987ce71> in dimensional_reduction(forward)
36 else:
37 if self.weighted_sum:
---> 38 self.d = self.linear(self.d)
39 else:
40 self.d = torch.mean(self.d,1)
~/.conda/envs/dalkeCourse/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
539 result = self._slow_forward(*input, **kwargs)
540 else:
--> 541 result = self.forward(*input, **kwargs)
542 for hook in self._forward_hooks.values():
543 hook_result = hook(self, input, result)
~/.conda/envs/dalkeCourse/lib/python3.6/site-packages/torch/nn/modules/linear.py in forward(self, input)
85
86 def forward(self, input):
---> 87 return F.linear(input, self.weight, self.bias)
88
89 def extra_repr(self):
~/.conda/envs/dalkeCourse/lib/python3.6/site-packages/torch/nn/functional.py in linear(input, weight, bias)
1368 if input.dim() == 2 and bias is not None:
1369 # fused op is marginally faster
-> 1370 ret = torch.addmm(bias, input, weight.t())
1371 else:
1372 output = input.matmul(weight.t())
RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_addmm
self.d虽然是张量,但是已经传入GPU,如下图:
self.d =
tensor([[3.7307e-04, 8.4476e-04, 4.0426e-04, ..., 4.2015e-04, 1.7830e-04,
1.2833e-04],
[3.9271e-04, 4.8325e-04, 9.5238e-04, ..., 1.5126e-04, 1.3420e-04,
3.9260e-04],
[1.9278e-04, 2.6530e-04, 8.6903e-04, ..., 1.6985e-05, 9.5103e-05,
1.9610e-04],
...,
[1.8257e-05, 3.1304e-04, 4.6398e-04, ..., 2.7327e-04, 1.1909e-04,
1.5069e-04],
[1.7577e-04, 3.4820e-05, 9.4168e-04, ..., 3.2848e-04, 2.2514e-04,
5.4275e-05],
[4.2916e-04, 1.6155e-04, 9.3186e-04, ..., 1.0950e-04, 2.5083e-04,
3.7374e-06]], device='cuda:0', grad_fn=<AbsBackward>)
在 MSE_loss
的 forward
中,您定义了一个 可能 仍在 CPU 中的线性层(您没有没有提供 MCVE,所以我只能假设):
self.linear = nn.Linear(output1.size()[0], 1)
如果您想尝试看看这是否是问题所在,您可以:
self.linear = nn.Linear(output1.size()[0], 1).cuda()
但是,如果 self.d
在 CPU 中,那么它将再次失败。要解决此问题,您可以通过执行以下操作将线性移动到 self.d
张量的同一设备:
def forward(self, output1, output2, labels):
self.labels = labels
self.linear = nn.Linear(output1.size()[0], 1)
if self.metric == 'cos':
self.d = F.cosine_similarity(output1, output2)
elif self.metric == 'l1':
self.d = torch.abs(output1-output2)
elif self.metric == 'l2':
self.d = torch.sqrt((output1-output2)**2)
# move self.linear to the correct device
self.linear = self.linear.to(self.d.device)
我在建模型的时候也遇到了同样的问题,最后发现是因为我重新训练了模型的全连接层,像这样:
net.to(device)
pre_trained_model=model_path
missing_keys,unexpected_keys=net.load_state_dict(torch.load(pre_trained_model),strict=False)
net.fc=nn.Linear(inchannel,CLASSES)
虽然模型是传输到 cuda 的,但重新定义的 fc 不是,所以最后一行应该是:
net.fc=nn.Linear(inchannel,CLASSES).to(device)
所以看看这种情况的发生是否有帮助。
作为补充或笼统的回答,每次遇到这个cuda
and cpu
unmatched error,首先要检查以下三点:
- 你是否把你的
model
放在cuda
上,换句话说,你是否有类似的代码:
model = nn.DataParallel(model, device_ids=None).cuda()
- 是否将
input data
放在cuda
上,例如input_data.cuda()
- 是否将
tensor
放在cuda
上,例如:
loss_sum = torch.tensor([losses.sum], dtype=torch.float32, device=device)
emmm,做好这三项检查,说不定就能解决你的问题,祝你好运。
我也遇到了同样的问题,结果发现应该用
customized_block = nn.ModuleList([])
而不是
customized_block = []
定义模型时。
由于普通列表中的模块不会被识别为nn.Module
,因此在调用model.cuda()
时不会将其放在GPU上。