PyTorch Lightning 子模型列表不会传输到 GPU

Question

在 CPU 上使用 PyTorch Lightning 时，一切正常。但是，当使用 GPU 时，我得到 RuntimeError: Expected all tensors to be on the same device.

问题似乎出在模型中，该模型使用了未传递给 GPU 的子模型列表：

class LambdaLayer(LightningModule):
    def __init__(self, fun):
        super(LambdaLayer, self).__init__()
        self.fun = fun

    def forward(self, x):
        return self.fun(x)

class TorchModel(LightningModule):
    def __init__(self):
        super(TorchModel, self).__init__()
        self.cat_layers = [TorchCatEmbedding(cat) for cat in columns_to_embed]
        self.num_layers = [LambdaLayer(lambda x: x[:, idx:idx+1]) for _, idx in numeric_columns]
        self.ffo = TorchFFO(len(self.num_layers) + sum([embed_dim(l) for l in self.cat_layers]), y.shape[1])
        self.softmax = torch.nn.Softmax(dim=1)

model = TorchModel()
trainer = Trainer(gpus=-1)

之前运行 trainer(model):

>>> model.device
device(type='cpu')

>>> model.ffo.device
device(type='cpu')

>>> model.cat_layers[0].device
device(type='cpu')

运行trainer(model)之后：

>>> model.device
device(type='cuda', index=0) # <---- correct

>>> model.ffo.device
device(type='cuda', index=0) # <---- correct

>>> model.cat_layers[0].device
device(type='cpu') # <---- still showing 'cpu'

显然，PyTorch Lightning 无法将子模型列表传输到 GPU。如何将整个模型，包括子模型列表（cat_layers 和 num_layers）转移到 GPU？

Answer 1

列表中包含的子模块未注册，无法按原样转换。您需要使用 ModuleList 代替，即：

...
from torch.nn import ModuleList
...

class TorchModel(LightningModule):
    def __init__(self):
        super(TorchModel, self).__init__()
        self.cat_layers = ModuleList([TorchCatEmbedding(cat) for cat in columns_to_embed])
        self.num_layers = ModuleList([LambdaLayer(lambda x: x[:, idx:idx+1]) for _, idx in numeric_columns])
        self.ffo = TorchFFO(len(self.num_layers) + sum([embed_dim(l) for l in self.cat_layers]), y.shape[1])
        self.softmax = torch.nn.Softmax(dim=1)

编辑：我不确定闪电等效物是什么，或者如果存在这样的等效物，另请参阅 PyTorch Lightning - LightningModule for ModuleList / ModuleDict?

PyTorch Lightning 子模型列表不会传输到 GPU

Lists of PyTorch Lightning sub-models don't get transferred to GPU

python

gpu

pytorch

pytorch-lightning