IBM XL C/C++ 编译器:"No valid target devices available"
IBM XL C/C++ Compiler: "No valid target devices available"
我正在使用 IBM XL C/C++ 编译器编译一些 OpenMP 4.5 代码,目的是将其部分工作卸载到 GPU,如下所示:
xlc++ mycode.cpp -qsmp=omp -qreport -qoffload -std=c++11 -Wall
编译似乎成功了,只给我以下信息:
mycode.cpp:
"mycode.cpp", line 284: 1586-358 (I) Loop was parallelized.
"mycode.cpp", line 293: 1586-358 (I) Loop was parallelized.
"mycode.cpp", line 309: 1586-358 (I) Loop was parallelized.
"mycode.cpp", line 324: 1586-358 (I) Loop was parallelized.
"mycode.cpp", line 126: 1586-674 (I) Remark: Simd or nested parallel directive requires OpenMP runtime
"" 1586-671 (I) GPU OpenMP Runtime is required for offloaded kernel '__xl__Z9MyCodeiii_l123_h44039046689_OL_1'
但是,当我 运行 代码时,我收到以下令人不快的消息:
1587-169 No valid target devices available.
使用 nvidia-smi
,我已验证目标设备实际上可用:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.59 Driver Version: 384.59 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-SXM2... Off | 00000002:01:00.0 Off | 0 |
| N/A 33C P0 29W / 300W | 10MiB / 16276MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-SXM2... Off | 00000003:01:00.0 Off | 0 |
| N/A 29C P0 30W / 300W | 10MiB / 16276MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
| 2 Tesla P100-SXM2... Off | 00000006:01:00.0 Off | 0 |
| N/A 31C P0 28W / 300W | 10MiB / 16276MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
| 3 Tesla P100-SXM2... Off | 00000007:01:00.0 Off | 0 |
| N/A 27C P0 29W / 300W | 10MiB / 16276MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
我的想法是 XL 以某种方式针对错误的加速器,但我找不到设置它的选项。
如何让我的代码识别和利用可用的 GPU?
-qtgtarch
指定代码可能 运行 的 GPU 架构。如果您希望编译器自动检测正在执行编译器的系统的设备 0 的体系结构,请尝试 -qtgtarch=auto
。或者,您可以尝试手动设置它,例如 -qtgtarch=sm_60
。
更多信息请见 Knowledge Center。
我正在使用 IBM XL C/C++ 编译器编译一些 OpenMP 4.5 代码,目的是将其部分工作卸载到 GPU,如下所示:
xlc++ mycode.cpp -qsmp=omp -qreport -qoffload -std=c++11 -Wall
编译似乎成功了,只给我以下信息:
mycode.cpp:
"mycode.cpp", line 284: 1586-358 (I) Loop was parallelized.
"mycode.cpp", line 293: 1586-358 (I) Loop was parallelized.
"mycode.cpp", line 309: 1586-358 (I) Loop was parallelized.
"mycode.cpp", line 324: 1586-358 (I) Loop was parallelized.
"mycode.cpp", line 126: 1586-674 (I) Remark: Simd or nested parallel directive requires OpenMP runtime
"" 1586-671 (I) GPU OpenMP Runtime is required for offloaded kernel '__xl__Z9MyCodeiii_l123_h44039046689_OL_1'
但是,当我 运行 代码时,我收到以下令人不快的消息:
1587-169 No valid target devices available.
使用 nvidia-smi
,我已验证目标设备实际上可用:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.59 Driver Version: 384.59 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-SXM2... Off | 00000002:01:00.0 Off | 0 |
| N/A 33C P0 29W / 300W | 10MiB / 16276MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-SXM2... Off | 00000003:01:00.0 Off | 0 |
| N/A 29C P0 30W / 300W | 10MiB / 16276MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
| 2 Tesla P100-SXM2... Off | 00000006:01:00.0 Off | 0 |
| N/A 31C P0 28W / 300W | 10MiB / 16276MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
| 3 Tesla P100-SXM2... Off | 00000007:01:00.0 Off | 0 |
| N/A 27C P0 29W / 300W | 10MiB / 16276MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
我的想法是 XL 以某种方式针对错误的加速器,但我找不到设置它的选项。
如何让我的代码识别和利用可用的 GPU?
-qtgtarch
指定代码可能 运行 的 GPU 架构。如果您希望编译器自动检测正在执行编译器的系统的设备 0 的体系结构,请尝试 -qtgtarch=auto
。或者,您可以尝试手动设置它,例如 -qtgtarch=sm_60
。
更多信息请见 Knowledge Center。