如何找到 pytorch cuda 运行 设置的 nvidia GPU ID?

How to find the nvidia GPU IDs for pytorch cuda run setup?

大多数年轻的数据科学家、爱好者问我的一个主要问题是如何在 Pytorch 代码中找到要映射的 GPU ID?

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

这可以通过下面的这段代码轻松找到。

import torch
import sys
print('__Python VERSION:', sys.version)
print('__pyTorch VERSION:', torch.__version__)
print('__CUDA VERSION')
from subprocess import call
# call(["nvcc", "--version"]) does not work
! nvcc --version
print('__CUDNN VERSION:', torch.backends.cudnn.version())
print('__Number CUDA Devices:', torch.cuda.device_count())
print('__Devices')
call(["nvidia-smi", "--format=csv", "--query-gpu=index,name,driver_version,memory.total,memory.used,memory.free"])
print('Active CUDA Device: GPU', torch.cuda.current_device())
print ('Available devices ', torch.cuda.device_count())
print ('Current cuda device ', torch.cuda.current_device())

这将产生以下输出:

__Python VERSION: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:10) 
[GCC 10.3.0]
__pyTorch VERSION: 1.12.0a0+bd13bc6
__CUDA VERSION
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_Mar__8_18:18:20_PST_2022
Cuda compilation tools, release 11.6, V11.6.124
Build cuda_11.6.r11.6/compiler.31057947_0
__CUDNN VERSION: 8400
__Number CUDA Devices: 2
__Devices
index, name, driver_version, memory.total [MiB], memory.used [MiB], memory.free [MiB]

0, Tesla V100-SXM2-32GB, 470.103.01, 32510 MiB, 3381 MiB, 29129 MiB
1, Tesla V100-SXM2-32GB, 470.103.01, 32510 MiB, 684 MiB, 31826 MiB

Active CUDA Device: GPU 0
Available devices  2
Current cuda device  0

import pycuda
from pycuda import compiler
import pycuda.driver as drv

drv.init()
print("%d device(s) found." % drv.Device.count())
           
for ordinal in range(drv.Device.count()):
    dev = drv.Device(ordinal)
    print (ordinal, dev.name())

这将产生以下输出

2 device(s) found.
0 Tesla V100-SXM2-32GB
1 Tesla V100-SXM2-32GB

如果您收到 pycuda 模块未找到的错误,您可以简单地执行 pip 安装。

pip install pycuda