PyTorch Lightning 子模型列表不会传输到 GPU
Lists of PyTorch Lightning sub-models don't get transferred to GPU
在 CPU 上使用 PyTorch Lightning 时,一切正常。但是,当使用 GPU 时,我得到 RuntimeError: Expected all tensors to be on the same device
.
问题似乎出在模型中,该模型使用了未传递给 GPU 的子模型列表:
class LambdaLayer(LightningModule):
def __init__(self, fun):
super(LambdaLayer, self).__init__()
self.fun = fun
def forward(self, x):
return self.fun(x)
class TorchModel(LightningModule):
def __init__(self):
super(TorchModel, self).__init__()
self.cat_layers = [TorchCatEmbedding(cat) for cat in columns_to_embed]
self.num_layers = [LambdaLayer(lambda x: x[:, idx:idx+1]) for _, idx in numeric_columns]
self.ffo = TorchFFO(len(self.num_layers) + sum([embed_dim(l) for l in self.cat_layers]), y.shape[1])
self.softmax = torch.nn.Softmax(dim=1)
model = TorchModel()
trainer = Trainer(gpus=-1)
之前 运行 trainer(model)
:
>>> model.device
device(type='cpu')
>>> model.ffo.device
device(type='cpu')
>>> model.cat_layers[0].device
device(type='cpu')
运行trainer(model)
之后:
>>> model.device
device(type='cuda', index=0) # <---- correct
>>> model.ffo.device
device(type='cuda', index=0) # <---- correct
>>> model.cat_layers[0].device
device(type='cpu') # <---- still showing 'cpu'
显然,PyTorch Lightning 无法将子模型列表传输到 GPU。如何将整个模型,包括子模型列表(cat_layers
和 num_layers
)转移到 GPU?
列表中包含的子模块未注册,无法按原样转换。
您需要使用 ModuleList 代替,即:
...
from torch.nn import ModuleList
...
class TorchModel(LightningModule):
def __init__(self):
super(TorchModel, self).__init__()
self.cat_layers = ModuleList([TorchCatEmbedding(cat) for cat in columns_to_embed])
self.num_layers = ModuleList([LambdaLayer(lambda x: x[:, idx:idx+1]) for _, idx in numeric_columns])
self.ffo = TorchFFO(len(self.num_layers) + sum([embed_dim(l) for l in self.cat_layers]), y.shape[1])
self.softmax = torch.nn.Softmax(dim=1)
编辑:我不确定闪电等效物是什么,或者如果存在这样的等效物,另请参阅 PyTorch Lightning - LightningModule for ModuleList / ModuleDict?
在 CPU 上使用 PyTorch Lightning 时,一切正常。但是,当使用 GPU 时,我得到 RuntimeError: Expected all tensors to be on the same device
.
问题似乎出在模型中,该模型使用了未传递给 GPU 的子模型列表:
class LambdaLayer(LightningModule):
def __init__(self, fun):
super(LambdaLayer, self).__init__()
self.fun = fun
def forward(self, x):
return self.fun(x)
class TorchModel(LightningModule):
def __init__(self):
super(TorchModel, self).__init__()
self.cat_layers = [TorchCatEmbedding(cat) for cat in columns_to_embed]
self.num_layers = [LambdaLayer(lambda x: x[:, idx:idx+1]) for _, idx in numeric_columns]
self.ffo = TorchFFO(len(self.num_layers) + sum([embed_dim(l) for l in self.cat_layers]), y.shape[1])
self.softmax = torch.nn.Softmax(dim=1)
model = TorchModel()
trainer = Trainer(gpus=-1)
之前 运行 trainer(model)
:
>>> model.device
device(type='cpu')
>>> model.ffo.device
device(type='cpu')
>>> model.cat_layers[0].device
device(type='cpu')
运行trainer(model)
之后:
>>> model.device
device(type='cuda', index=0) # <---- correct
>>> model.ffo.device
device(type='cuda', index=0) # <---- correct
>>> model.cat_layers[0].device
device(type='cpu') # <---- still showing 'cpu'
显然,PyTorch Lightning 无法将子模型列表传输到 GPU。如何将整个模型,包括子模型列表(cat_layers
和 num_layers
)转移到 GPU?
列表中包含的子模块未注册,无法按原样转换。 您需要使用 ModuleList 代替,即:
...
from torch.nn import ModuleList
...
class TorchModel(LightningModule):
def __init__(self):
super(TorchModel, self).__init__()
self.cat_layers = ModuleList([TorchCatEmbedding(cat) for cat in columns_to_embed])
self.num_layers = ModuleList([LambdaLayer(lambda x: x[:, idx:idx+1]) for _, idx in numeric_columns])
self.ffo = TorchFFO(len(self.num_layers) + sum([embed_dim(l) for l in self.cat_layers]), y.shape[1])
self.softmax = torch.nn.Softmax(dim=1)
编辑:我不确定闪电等效物是什么,或者如果存在这样的等效物,另请参阅 PyTorch Lightning - LightningModule for ModuleList / ModuleDict?