PyTorch Lightning 在 validation_epoch_end 中移动张量以更正设备

Question

我想在 LightningModule 的 validation_epoch_end 方法中创建一个新的张量。从官方 docs（第 48 页）来看，我们应该避免直接 .cuda() 或 .to(device) 调用：

There are no .cuda() or .to() calls. . . Lightning does these for you.

我们鼓励使用 type_as 方法传输到正确的设备。

new_x = new_x.type_as(x.type())

但是，在步骤 validation_epoch_end 中，我没有任何张量可以从（通过 type_as 方法）以干净的方式复制设备。

我的问题是，如果我想用这种方法创建一个新的张量并将其传输到模型所在的设备上，我该怎么办？

我唯一能想到的就是在outputs字典里找张量，但是感觉有点乱：

avg_loss = torch.stack([x['val_loss'] for x in outputs]).mean()
output = self(self.__test_input.type_as(avg_loss))

有什么干净的方法可以做到这一点吗？

Answer 1

您是否检查了您链接的文档中的第 3.4 部分（第 34 页）？

LightningModules know what device they are on! construct tensors on the device directly to avoid CPU->Device transfer

t = tensor.rand(2, 2).cuda()# bad
(self is lightningModule)t = tensor.rand(2,2, device=self.device)# good

我在创建张量时遇到了类似的问题，这对我很有帮助。希望对你也有帮助。

PyTorch Lightning move tensor to correct device in validation_epoch_end