运行 Python 在 GPU 上使用 Ray 的函数

Question

我正在使用一个名为 Ray 的 Python 包来并行运行下面显示的示例。代码是运行在具有 80 CPU 个内核和 4 个 GPU 的机器上。

import ray
import time

ray.init()

@ray.remote
def squared(x):
    time.sleep(1)
    y = x**2
    return y

tic = time.perf_counter()

lazy_values = [squared.remote(x) for x in range(1000)]
values = ray.get(lazy_values)

toc = time.perf_counter()

print(f'Elapsed time {toc - tic:.2f} s')
print(f'{values[:5]} ... {values[-5:]}')

ray.shutdown()

上述示例的输出是：

Elapsed time 13.09 s
[0, 1, 4, 9, 16] ... [990025, 992016, 994009, 996004, 998001]

下面是相同的示例，但我想使用 num_gpus 参数在 GPU 上运行它。机器上可用的 GPU 是 Nvidia Tesla V100。

import ray
import time

ray.init(num_gpus=1)

@ray.remote(num_gpus=1)
def squared(x):
    time.sleep(1)
    y = x**2
    return y

tic = time.perf_counter()

lazy_values = [squared.remote(x) for x in range(1000)]
values = ray.get(lazy_values)

toc = time.perf_counter()

print(f'Elapsed time {toc - tic:.2f} s')
print(f'{values[:5]} ... {values[-5:]}')

ray.shutdown()

GPU 示例从未完成，我在几分钟后终止了它。我使用 import ray; ray.init(); ray.available_resources() 检查了 Ray 可用的资源，它报告了 80 CPUs 和 4 个 GPU。所以 Ray 似乎知道可用的 GPU。

我通过将 range(1000) 更改为 range(10)，将 GPU 示例修改为运行更少的执行。请参阅下面的修改示例。

import ray
import time

ray.init(num_gpus=1)

@ray.remote(num_gpus=1)
def squared(x):
    time.sleep(1)
    y = x**2
    return y

tic = time.perf_counter()

lazy_values = [squared.remote(x) for x in range(10)]
values = ray.get(lazy_values)

toc = time.perf_counter()

print(f'Elapsed time {toc - tic:.2f} s')
print(f'{values[:5]} ... {values[-5:]}')

ray.shutdown()

修改后的 GPU 示例的输出是：

Elapsed time 10.06 s
[0, 1, 4, 9, 16] ... [25, 36, 49, 64, 81]

修改后的 GPU 示例完成，但看起来 Ray 没有并行使用 GPU。我还应该做些什么来让 Ray 在 GPU 上并行运行吗？

Answer 1

@ray.remote(num_gpus=1)

这告诉 ray 你的函数将消耗整个 GPU。因此，它串行运行。文档说你应该在这里指定一个小数来获得多处理：

@ray.remote(num_gpus = 0.1)

https://docs.ray.io/en/latest/using-ray-with-gpus.html

运行 Python 在 GPU 上使用 Ray 的函数

Run a Python function on a GPU using Ray

python

gpu

ray