如何使用 Theano 启用 Keras 以利用多个 GPU
How to enable Keras with Theano to utilize multiple GPUs
设置:
- 使用带有 Nvidia GPU 的 Amazon Linux 系统
- 我正在使用 Keras 1.0.1
- 运行 Theano v0.8.2 后端
- 使用 CUDA 和 CuDNN
- THEANO_FLAGS="device=gpu,floatX=float32,lib.cnmem=1"
一切正常,但是当我增加批量大小以加快训练速度时,我 运行 在大型模型上的显存不足。我认为迁移到 4 GPU 系统理论上会提高可用总内存或允许更小的批次更快地构建,但观察 nvidia 统计数据,我可以看到默认情况下只使用一个 GPU:
+------------------------------------------------------+
| NVIDIA-SMI 361.42 Driver Version: 361.42 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GRID K520 Off | 0000:00:03.0 Off | N/A |
| N/A 44C P0 45W / 125W | 3954MiB / 4095MiB | 94% Default |
+-------------------------------+----------------------+----------------------+
| 1 GRID K520 Off | 0000:00:04.0 Off | N/A |
| N/A 28C P8 17W / 125W | 11MiB / 4095MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GRID K520 Off | 0000:00:05.0 Off | N/A |
| N/A 32C P8 17W / 125W | 11MiB / 4095MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 GRID K520 Off | 0000:00:06.0 Off | N/A |
| N/A 29C P8 17W / 125W | 11MiB / 4095MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 9862 C python34 3941MiB |
我知道使用原始 Theano,您可以显式手动使用多个 GPU。 Keras 是否支持使用多个 GPU?如果是这样,它是否将其抽象化,或者您是否需要将 GPU 映射到 Theano 中的设备,并明确地将计算编组到特定的 GPU?
多GPU训练是experimental ("The code is rather new and is still considered experimental at this point. It has been tested and seems to perform correctly in all cases observed, but make sure to double-check your results before publishing a paper or anything of the sort.") and hasn't been integrated into Keras yet. However, you can use multiple GPUs with Keras with the Tensorflow backend: https://blog.keras.io/keras-as-a-simplified-interface-to-tensorflow-tutorial.html#multi-gpu-and-distributed-training。
设置:
- 使用带有 Nvidia GPU 的 Amazon Linux 系统
- 我正在使用 Keras 1.0.1
- 运行 Theano v0.8.2 后端
- 使用 CUDA 和 CuDNN
- THEANO_FLAGS="device=gpu,floatX=float32,lib.cnmem=1"
一切正常,但是当我增加批量大小以加快训练速度时,我 运行 在大型模型上的显存不足。我认为迁移到 4 GPU 系统理论上会提高可用总内存或允许更小的批次更快地构建,但观察 nvidia 统计数据,我可以看到默认情况下只使用一个 GPU:
+------------------------------------------------------+
| NVIDIA-SMI 361.42 Driver Version: 361.42 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GRID K520 Off | 0000:00:03.0 Off | N/A |
| N/A 44C P0 45W / 125W | 3954MiB / 4095MiB | 94% Default |
+-------------------------------+----------------------+----------------------+
| 1 GRID K520 Off | 0000:00:04.0 Off | N/A |
| N/A 28C P8 17W / 125W | 11MiB / 4095MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GRID K520 Off | 0000:00:05.0 Off | N/A |
| N/A 32C P8 17W / 125W | 11MiB / 4095MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 GRID K520 Off | 0000:00:06.0 Off | N/A |
| N/A 29C P8 17W / 125W | 11MiB / 4095MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 9862 C python34 3941MiB |
我知道使用原始 Theano,您可以显式手动使用多个 GPU。 Keras 是否支持使用多个 GPU?如果是这样,它是否将其抽象化,或者您是否需要将 GPU 映射到 Theano 中的设备,并明确地将计算编组到特定的 GPU?
多GPU训练是experimental ("The code is rather new and is still considered experimental at this point. It has been tested and seems to perform correctly in all cases observed, but make sure to double-check your results before publishing a paper or anything of the sort.") and hasn't been integrated into Keras yet. However, you can use multiple GPUs with Keras with the Tensorflow backend: https://blog.keras.io/keras-as-a-simplified-interface-to-tensorflow-tutorial.html#multi-gpu-and-distributed-training。