Pytorch CPU 没有 gpu 的 CUDA 设备加载

Question

我找到了这个很棒的代码 Pytorch mobilenet，我无法在 CPU 上获得运行。 https://github.com/rdroste/unisal

我是 Pytorch 的新手，所以我不知道该怎么做。

在模块的第 174 行 train.py 设置了设备：

device = 'cuda:0' if torch.cuda.is_available() else 'cpu'

就我对 Pytorch 的了解而言，这是正确的。

我也必须更改 torch.load 吗？我试过了没有成功。

class BaseModel(nn.Module):
    """Abstract model class with functionality to save and load weights"""

    def forward(self, *input):
        raise NotImplementedError

    def save_weights(self, directory, name):
        torch.save(self.state_dict(), directory / f'weights_{name}.pth')

    def load_weights(self, directory, name):
        self.load_state_dict(torch.load(directory / f'weights_{name}.pth'))

    def load_best_weights(self, directory):
        self.load_state_dict(torch.load(directory / f'weights_best.pth'))

    def load_epoch_checkpoint(self, directory, epoch):
        """Load state_dict from a Trainer checkpoint at a specific epoch"""
        chkpnt = torch.load(directory / f"chkpnt_epoch{epoch:04d}.pth")
        self.load_state_dict(chkpnt['model_state_dict'])

    def load_checkpoint(self, file):
        """Load state_dict from a specific Trainer checkpoint"""
        """Load """
        chkpnt = torch.load(file)
        self.load_state_dict(chkpnt['model_state_dict'])

    def load_last_chkpnt(self, directory):
        """Load state_dict from the last Trainer checkpoint"""
        last_chkpnt = sorted(list(directory.glob('chkpnt_epoch*.pth')))[-1]
        self.load_checkpoint(last_chkpnt)

我不明白。我必须在哪里告诉 pytorch 没有 gpu？

完整错误：

Traceback (most recent call last):
  File "run.py", line 99, in <module>
    fire.Fire()

  File "/home/b256/anaconda3/envs/unisal36/lib/python3.6/site-packages/fire/core.py", line 138, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)

  File "/home/b256/anaconda3/envs/unisal36/lib/python3.6/site-packages/fire/core.py", line 471, in _Fire
    target=component.__name__)

  File "/home/b256/anaconda3/envs/unisal36/lib/python3.6/site-packages/fire/core.py", line 675, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)

  File "run.py", line 95, in predict_examples
    example_folder, is_video, train_id=train_id, source=source)

  File "run.py", line 72, in predictions_from_folder
    folder_path, is_video, source=source, model_domain=model_domain)

  File "/home/b256/Data/saliency_models/unisal-master/unisal/train.py", line 871, in generate_predictions_from_path
    self.model.load_best_weights(self.train_dir)

  File "/home/b256/Data/saliency_models/unisal-master/unisal/train.py", line 1057, in model
    self._model = model_cls(**self.model_cfg)

  File "/home/b256/Data/saliency_models/unisal-master/unisal/model.py", line 190, in __init__
    self.cnn = MobileNetV2(**self.cnn_cfg)

  File "/home/b256/Data/saliency_models/unisal-master/unisal/models/MobileNetV2.py", line 156, in __init__
    Path(__file__).resolve().parent / 'weights/mobilenet_v2.pth.tar')

  File "/home/b256/anaconda3/envs/unisal36/lib/python3.6/site-packages/torch/serialization.py", line 367, in load
    return _load(f, map_location, pickle_module)

  File "/home/b256/anaconda3/envs/unisal36/lib/python3.6/site-packages/torch/serialization.py", line 538, in _load
    result = unpickler.load()

  File "/home/b256/anaconda3/envs/unisal36/lib/python3.6/site-packages/torch/serialization.py", line 504, in persistent_load
    data_type(size), location)

  File "/home/b256/anaconda3/envs/unisal36/lib/python3.6/site-packages/torch/serialization.py", line 113, in default_restore_location
    result = fn(storage, location)

  File "/home/b256/anaconda3/envs/unisal36/lib/python3.6/site-packages/torch/serialization.py", line 94, in _cuda_deserialize
    device = validate_cuda_device(location)

  File "/home/b256/anaconda3/envs/unisal36/lib/python3.6/site-packages/torch/serialization.py", line 78, in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location='cpu' to map your storages to the CPU.

Answer 1

在 https://pytorch.org/tutorials/beginner/saving_loading_models.html#save-on-gpu-load-on-cpu 中，您会看到有一个 map_location 关键字参数用于将权重发送到正确的设备：

model.load_state_dict(torch.load(PATH, map_location=device))

来自文档 https://pytorch.org/docs/stable/generated/torch.load.html#torch.load

torch.load() uses Python’s unpickling facilities but treats storages, which underlie tensors, specially. They are first deserialized on the CPU and are then moved to the device they were saved from. If this fails (e.g. because the run time system doesn’t have certain devices), an exception is raised. However, storages can be dynamically remapped to an alternative set of devices using the map_location argument.

If map_location is a callable, it will be called once for each serialized storage with two arguments: storage and location. The storage argument will be the initial deserialization of the storage, residing on the CPU. Each serialized storage has a location tag associated with it which identifies the device it was saved from, and this tag is the second argument passed to map_location. The builtin location tags are 'cpu' for CPU tensors and 'cuda:device_id' (e.g. 'cuda:2') for CUDA tensors. map_location should return either None or a storage. If map_location returns a storage, it will be used as the final deserialized object, already moved to the right device. Otherwise, torch.load() will fall back to the default behavior, as if map_location wasn’t specified.

If map_location is a torch.device object or a string containing a device tag, it indicates the location where all tensors should be loaded.

Otherwise, if map_location is a dict, it will be used to remap location tags appearing in the file (keys), to ones that specify where to put the storages (values).

Pytorch CPU 没有 gpu 的 CUDA 设备加载

Pytorch CPU CUDA device load without gpu

python

gpu

pytorch