mxnet 和 tensorflow 中错误的 gpu 顺序
Wrong gpu order in mxnet and tensorflow
我的桌面安装了 2 个 gpu:1080 和 1080Ti
nvidia-smi显示gpu-0是1080,gpu-1是1080Ti
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79 Driver Version: 410.79 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 Off | 00000000:01:00.0 Off | N/A |
| 26% 57C P2 53W / 215W | 696MiB / 8119MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... Off | 00000000:02:00.0 Off | N/A |
| 55% 70C P2 204W / 250W | 8641MiB / 11178MiB | 28% Default |
+-------------------------------+----------------------+----------------------+
现在 tensorflow 和 mxnet 使用相反的顺序:当我指定 gpu=0 时为 1080ti,当我指定 gpu=1 时为 1080 .
为什么会出现这种情况以及如何将tensorflow和mxnet gpu顺序与nvidia-smi gpu顺序同步?
mxnet 的代码片段:
mod = mx.mod.Module(symbol, label_names=None, context=mx.gpu(0))
对于tensorflow我使用环境变量
CUDA_VISIBLE_DEVICES="0"
设置
export CUDA_DEVICE_ORDER=PCI_BUS_ID
.
另见 this question
我的桌面安装了 2 个 gpu:1080 和 1080Ti nvidia-smi显示gpu-0是1080,gpu-1是1080Ti
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79 Driver Version: 410.79 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 Off | 00000000:01:00.0 Off | N/A |
| 26% 57C P2 53W / 215W | 696MiB / 8119MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... Off | 00000000:02:00.0 Off | N/A |
| 55% 70C P2 204W / 250W | 8641MiB / 11178MiB | 28% Default |
+-------------------------------+----------------------+----------------------+
现在 tensorflow 和 mxnet 使用相反的顺序:当我指定 gpu=0 时为 1080ti,当我指定 gpu=1 时为 1080 .
为什么会出现这种情况以及如何将tensorflow和mxnet gpu顺序与nvidia-smi gpu顺序同步?
mxnet 的代码片段:
mod = mx.mod.Module(symbol, label_names=None, context=mx.gpu(0))
对于tensorflow我使用环境变量
CUDA_VISIBLE_DEVICES="0"
设置
export CUDA_DEVICE_ORDER=PCI_BUS_ID
.
另见 this question