如何在独立显卡 AMD GPU 上 运行 Python 编写脚本?
How to run Python script on a Discrete Graphics AMD GPU?
WHAT I WANT TO DO:
我有一个脚本,用于在给定范围内分解质数:
# Python program to display all the prime numbers within an interval
lower = 900
upper = 1000
print("Prime numbers between", lower, "and", upper, "are:")
for num in range(lower, upper + 1):
# all prime numbers are greater than 1
if num > 1:
for i in range(2, num):
if (num % i) == 0:
break
else:
print(num)
我想使用 GPU 而不是 CPU 到 运行 这样的脚本,这样速度会更快
THE PROBLEM:
我的 Intel NUC NUC8i7HVK but a "Discrete GPU"
上没有 NVIDIA GPU
如果我 运行 此代码检查我的 GPU 是什么:
import pyopencl as cl
import numpy as np
a = np.arange(32).astype(np.float32)
res = np.empty_like(a)
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)
mf = cl.mem_flags
a_buf = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=a)
dest_buf = cl.Buffer(ctx, mf.WRITE_ONLY, res.nbytes)
prg = cl.Program(ctx, """
__kernel void sq(__global const float *a,
__global float *c)
{
int gid = get_global_id(0);
c[gid] = a[gid] * a[gid];
}
""").build()
prg.sq(queue, a.shape, None, a_buf, dest_buf)
cl.enqueue_copy(queue, res, dest_buf)
print (a, res)
我收到:
[0] <pyopencl.Platform 'AMD Accelerated Parallel Processing' at 0x7ffb3d492fd0>
[1] <pyopencl.Platform 'Intel(R) OpenCL HD Graphics' at 0x187b648ed80>
THE POSSIBLE APPROACH TO THE PROBLEM:
我发现了一个guide that takes you by the hand and explains step by step how to run it on your GPU. But all Pyhton libraries that pipes Python through the GPU like PyOpenGL, PyOpenCL, Tensorflow ()、PyTorch等...都是为NVIDIA量身定做的
如果你有 AMD 所有库都要求 ROCm 但据我所知,此类软件仍然不支持集成 GPU 或独立 GPU(请参阅下面我自己的回复)。
我只找到了一个 guide 谈论这种方法,但我无法使它起作用。
有希望还是我只是想做一些不可能的事情?
EDIT: Reply to @chapelo
如果我选择0
,回复是:
Set the environment variable PYOPENCL_CTX='0' to avoid being asked again.
[ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17.
18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31.] [ 0. 1. 4. 9. 16. 25. 36. 49. 64. 81. 100. 121. 144. 169.
196. 225. 256. 289. 324. 361. 400. 441. 484. 529. 576. 625. 676. 729.
784. 841. 900. 961.]
如果我选择1
,回复是:
Set the environment variable PYOPENCL_CTX='1' to avoid being asked again.
[ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17.
18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31.] [ 0. 1. 4. 9. 16. 25. 36. 49. 64. 81. 100. 121. 144. 169.
196. 225. 256. 289. 324. 361. 400. 441. 484. 529. 576. 625. 676. 729.
784. 841. 900. 961.]
经过大量研究和多次尝试,我得出结论:
- PyOpenGL: 主要与 NVIDIA 配合使用。如果您有 AMD GPU,则需要安装 ROCm
- PyOpenCL: 主要与 NVIDIA 配合使用。如果您有 AMD GPU,则需要安装 ROCm
- TensorFlow: 主要与 NVIDIA 配合使用。如果您有 AMD GPU,则需要安装 ROCm
- PyTorch: 主要与 NVIDIA 配合使用。如果您有 AMD GPU,则需要安装 ROCm
我安装了 ROCm 但如果我 运行 rocminfo
它 returns:
ROCk module is NOT loaded, possibly no GPU devices
Unable to open /dev/kfd read-write: No such file or directory
Failed to get user name to check for video group membership
hsa api call failure at: /src/rocminfo/rocminfo.cc:1142
Call returned HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to allocate the necessary resources. This error may also occur when the core runtime library needs to spawn threads or create internal OS-specific events.
clinfo
returns:
Number of platforms 1
Platform Name AMD Accelerated Parallel Processing
Platform Vendor Advanced Micro Devices, Inc.
Platform Version OpenCL 2.0 AMD-APP (3212.0)
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd cl_amd_event_callback
Platform Extensions function suffix AMD
Platform Name AMD Accelerated Parallel Processing
Number of devices 0
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform
clCreateContext(NULL, ...) [default] No platform
clCreateContext(NULL, ...) [other] No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No devices found in platform
rocm-smi
returns:
Segmentation fault
这是因为在 official guide 中它说 “Ryzen 的集成 GPU 不是 ROCm 官方支持的目标。” 因为我的是集成 GPU 我超出范围。
我不会再浪费时间了,可能会购买 NVIDIA 或 AMD eGPU(外部 GPU)
pyopencl
确实适用于您的 AMD 和英特尔 GPU。并且您检查了您的安装是否正常工作。只设置你的环境变量 PYOPENCL_CTX='0'
每次都使用 AMD 而不会被询问。
或者不使用 ctx = cl.create_some_context()
,您可以使用以下方法在程序中定义上下文:
platforms = cl.get_platforms()
ctx = cl.Context(
dev_type=cl.device_type.ALL,
properties=[(cl.context_properties.PLATFORM, platforms[0])])
不要理所当然地认为您的 AMD 在任何情况下都比您的 Intel 好。我遇到过英特尔超越另一个的案例。我认为这与将 CPU 之外的数据复制到另一个 GPU 的成本有关。
话虽如此,我认为与拥有更好的算法相比,运行并行执行您的脚本不会有太大改进:
- 使用筛选算法,得到质数直到你的上数的平方根。
- 应用类似的筛选算法,使用上一步中的素数从下限到上限筛选数字。
也许这不是一个可以轻松 运行 并行算法的好例子,但你们都准备尝试另一个例子。
但是,为了向您展示如何使用 GPU 解决此问题,请考虑以下更改:
串行算法如下所示:
from math import sqrt
def primes_below(number):
n = lambda a: 2 if a==0 else 2*a + 1
limit = int(sqrt(number)) + 1
size = number//2
primes = [True] * size
for i in range(1, size):
if primes[i]:
num = n(i)
for j in range(i+num, size, num):
primes[j] = False
for i, flag in enumerate(primes):
if flag: yield n(i)
def primes_between(lo, hi):
primes = list(primes_below(int(sqrt(hi))+1))
size = (hi - lo - (0 if hi%2 else 1))//2 + 1
n = lambda a: 2*a + lo + (0 if lo%2 else 1)
numbers = [True]*size
for i, prime in enumerate(primes):
if i == 0: continue
start = 0
while (n(start)%prime) != 0:
start += 1
for j in range(start, size, prime):
numbers[j] = False
for i, flag in enumerate(numbers):
if flag: yield n(i)
这会在 0.64 秒内打印出 1e6 和 5e6 之间的素数列表
尝试在我的 GPU 上使用您的脚本在 5 分钟内没有完成。对于一个小 10 倍的问题:1e5 和 5e5 之间的素数,大约需要 29 秒。
修改脚本,使 GPU 中的每个进程都将一个奇数(测试偶数没有意义)除以一系列预先计算的素数,直到上数的平方根,如果素数是大于数字本身的平方根,它在 0.50 秒内完成相同的任务。这是一个进步!
代码如下:
import numpy as np
import pyopencl as cl
import pyopencl.algorithm
import pyopencl.array
def primes_between_using_cl(lo, hi):
primes = list(primes_below(int(sqrt(hi))+1))
numbers_h = np.arange( lo + (0 if lo&1 else 1),
hi + (0 if hi&1 else 1),
2,
dtype=np.int32)
size = (hi - lo - (0 if hi%2 else 1))//2 + 1
code = """\
__kernel
void is_prime( __global const int *primes,
__global int *numbers) {
int gid = get_global_id(0);
int num = numbers[gid];
int max = (int) (sqrt((float)num) + 1.0);
for (; *primes; ++primes) {
if (*primes <= max && num % *primes == 0) {
numbers[gid] = 0;
return;
}
}
}
"""
platforms = cl.get_platforms()
ctx = cl.Context(dev_type=cl.device_type.ALL,
properties=[(cl.context_properties.PLATFORM, platforms[0])])
queue = cl.CommandQueue(ctx)
prg = cl.Program(ctx, code).build()
numbers_d = cl.array.to_device(queue, numbers_h)
primes_d = cl.array.to_device(queue,
np.array(primes[1:], # don't need 2
dtype=np.int32))
prg.is_prime(queue, (size, ), None, primes_d.data, numbers_d.data)
array, length = cl.algorithm.copy_if(numbers_d, "ary[i]>0")[:2]
yield from array.get()[:length.get()]
以下代码是完整 python 程序的示例,通常包括:
- 导入语句
- 函数定义
main()
函数
if __name__ == "__main__":
部分。
希望能帮到您解决问题。
import pyprimes
from math import sqrt
import numpy as np
import pyopencl as cl
import pyopencl.algorithm
import pyopencl.array
def primes_below(number):
"""Generate a list of prime numbers below a specified `number`"""
n = lambda a: 2 if a==0 else 2*a + 1
limit = int(sqrt(number)) + 1
size = number//2
primes = [True] * size
for i in range(1, size):
if primes[i]:
num = n(i)
if num > limit: break
for j in range(i+num, size, num):
primes[j] = False
for i, flag in enumerate(primes):
if flag:
yield n(i)
def primes_between(lo, hi):
"""Generate a list of prime numbers betwenn `lo` and `hi` numbers"""
primes = list(primes_below(int(sqrt(hi))+1))
size = (hi - lo - (0 if hi%2 else 1))//2 + 1
n = lambda a: 2*a + lo + (0 if lo%2 else 1)
numbers = [True]*size
for i, prime in enumerate(primes):
if i == 0: continue # avoid dividing by 2
nlo = n(0)
# slower # start = prime * (nlo//prime + 1) if nlo%prime else 0
start = 0
while (n(start)%prime) != 0:
start += 1
for j in range(start, size, prime):
numbers[j] = False
for i, flag in enumerate(numbers):
if flag:
yield n(i)
def primes_between_using_cl(lo, hi):
"""Generate a list of prime numbers betwenn a lo and hi numbers
this is a parallel algorithm using pyopencl"""
primes = list(primes_below(int(sqrt(hi))+1))
size_primes_h = np.array( (len(primes)-1, ), dtype=np.int32)
numbers_h = np.arange( lo + (0 if lo&1 else 1),
hi + (0 if hi&1 else 1),
2,
dtype=np.int32)
size = (hi - lo - (0 if hi%2 else 1))//2 + 1
code = """\
__kernel
void is_prime( __global const int *primes,
__global int *numbers) {
int gid = get_global_id(0);
int num = numbers[gid];
int max = (int) (sqrt((float)num) + 1.0);
for (; *primes; ++primes) {
if (*primes > max) break;
if (num % *primes == 0) {
numbers[gid] = 0;
return;
}
}
}
"""
platforms = cl.get_platforms()
ctx = cl.Context(dev_type=cl.device_type.ALL,
properties=[(cl.context_properties.PLATFORM, platforms[0])])
queue = cl.CommandQueue(ctx)
prg = cl.Program(ctx, code).build()
numbers_d = cl.array.to_device(queue, numbers_h)
primes_d = cl.array.to_device(queue, np.array(primes[1:], dtype=np.int32))
prg.is_prime(queue, (size, ), None, primes_d.data, numbers_d.data)
array, length = cl.algorithm.copy_if(numbers_d, "ary[i]>0")[:2]
yield from array.get()[:length.get()]
def test(f, lo, hi):
"""Test that all prime numbers are generated by comparing with the
output of the library `pyprimes`"""
a = filter(lambda p: p>lo, pyprimes.primes_below(hi))
b = f(lo, hi)
result = True
for p, q in zip (a, b):
if p != q:
print(p, q)
result = False
return result
def main():
lower = 1000
upper = 5000
print("The prime numbers between {} and {}, are:".format(lower,upper))
print()
for p in primes_between_using_cl(lower, upper):
print(p, end=' ')
print()
if __name__ == '__main__':
main()
WHAT I WANT TO DO:
我有一个脚本,用于在给定范围内分解质数:
# Python program to display all the prime numbers within an interval
lower = 900
upper = 1000
print("Prime numbers between", lower, "and", upper, "are:")
for num in range(lower, upper + 1):
# all prime numbers are greater than 1
if num > 1:
for i in range(2, num):
if (num % i) == 0:
break
else:
print(num)
我想使用 GPU 而不是 CPU 到 运行 这样的脚本,这样速度会更快
THE PROBLEM:
我的 Intel NUC NUC8i7HVK but a "Discrete GPU"
上没有 NVIDIA GPU如果我 运行 此代码检查我的 GPU 是什么:
import pyopencl as cl
import numpy as np
a = np.arange(32).astype(np.float32)
res = np.empty_like(a)
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)
mf = cl.mem_flags
a_buf = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=a)
dest_buf = cl.Buffer(ctx, mf.WRITE_ONLY, res.nbytes)
prg = cl.Program(ctx, """
__kernel void sq(__global const float *a,
__global float *c)
{
int gid = get_global_id(0);
c[gid] = a[gid] * a[gid];
}
""").build()
prg.sq(queue, a.shape, None, a_buf, dest_buf)
cl.enqueue_copy(queue, res, dest_buf)
print (a, res)
我收到:
[0] <pyopencl.Platform 'AMD Accelerated Parallel Processing' at 0x7ffb3d492fd0>
[1] <pyopencl.Platform 'Intel(R) OpenCL HD Graphics' at 0x187b648ed80>
THE POSSIBLE APPROACH TO THE PROBLEM:
我发现了一个guide that takes you by the hand and explains step by step how to run it on your GPU. But all Pyhton libraries that pipes Python through the GPU like PyOpenGL, PyOpenCL, Tensorflow (
如果你有 AMD 所有库都要求 ROCm 但据我所知,此类软件仍然不支持集成 GPU 或独立 GPU(请参阅下面我自己的回复)。
我只找到了一个 guide 谈论这种方法,但我无法使它起作用。
有希望还是我只是想做一些不可能的事情?
EDIT: Reply to @chapelo
如果我选择0
,回复是:
Set the environment variable PYOPENCL_CTX='0' to avoid being asked again.
[ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17.
18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31.] [ 0. 1. 4. 9. 16. 25. 36. 49. 64. 81. 100. 121. 144. 169.
196. 225. 256. 289. 324. 361. 400. 441. 484. 529. 576. 625. 676. 729.
784. 841. 900. 961.]
如果我选择1
,回复是:
Set the environment variable PYOPENCL_CTX='1' to avoid being asked again.
[ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17.
18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31.] [ 0. 1. 4. 9. 16. 25. 36. 49. 64. 81. 100. 121. 144. 169.
196. 225. 256. 289. 324. 361. 400. 441. 484. 529. 576. 625. 676. 729.
784. 841. 900. 961.]
经过大量研究和多次尝试,我得出结论:
- PyOpenGL: 主要与 NVIDIA 配合使用。如果您有 AMD GPU,则需要安装 ROCm
- PyOpenCL: 主要与 NVIDIA 配合使用。如果您有 AMD GPU,则需要安装 ROCm
- TensorFlow: 主要与 NVIDIA 配合使用。如果您有 AMD GPU,则需要安装 ROCm
- PyTorch: 主要与 NVIDIA 配合使用。如果您有 AMD GPU,则需要安装 ROCm
我安装了 ROCm 但如果我 运行 rocminfo
它 returns:
ROCk module is NOT loaded, possibly no GPU devices
Unable to open /dev/kfd read-write: No such file or directory
Failed to get user name to check for video group membership
hsa api call failure at: /src/rocminfo/rocminfo.cc:1142
Call returned HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to allocate the necessary resources. This error may also occur when the core runtime library needs to spawn threads or create internal OS-specific events.
clinfo
returns:
Number of platforms 1
Platform Name AMD Accelerated Parallel Processing
Platform Vendor Advanced Micro Devices, Inc.
Platform Version OpenCL 2.0 AMD-APP (3212.0)
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd cl_amd_event_callback
Platform Extensions function suffix AMD
Platform Name AMD Accelerated Parallel Processing
Number of devices 0
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform
clCreateContext(NULL, ...) [default] No platform
clCreateContext(NULL, ...) [other] No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No devices found in platform
rocm-smi
returns:
Segmentation fault
这是因为在 official guide 中它说 “Ryzen 的集成 GPU 不是 ROCm 官方支持的目标。” 因为我的是集成 GPU 我超出范围。
我不会再浪费时间了,可能会购买 NVIDIA 或 AMD eGPU(外部 GPU)
pyopencl
确实适用于您的 AMD 和英特尔 GPU。并且您检查了您的安装是否正常工作。只设置你的环境变量 PYOPENCL_CTX='0'
每次都使用 AMD 而不会被询问。
或者不使用 ctx = cl.create_some_context()
,您可以使用以下方法在程序中定义上下文:
platforms = cl.get_platforms()
ctx = cl.Context(
dev_type=cl.device_type.ALL,
properties=[(cl.context_properties.PLATFORM, platforms[0])])
不要理所当然地认为您的 AMD 在任何情况下都比您的 Intel 好。我遇到过英特尔超越另一个的案例。我认为这与将 CPU 之外的数据复制到另一个 GPU 的成本有关。
话虽如此,我认为与拥有更好的算法相比,运行并行执行您的脚本不会有太大改进:
- 使用筛选算法,得到质数直到你的上数的平方根。
- 应用类似的筛选算法,使用上一步中的素数从下限到上限筛选数字。
也许这不是一个可以轻松 运行 并行算法的好例子,但你们都准备尝试另一个例子。
但是,为了向您展示如何使用 GPU 解决此问题,请考虑以下更改:
串行算法如下所示:
from math import sqrt
def primes_below(number):
n = lambda a: 2 if a==0 else 2*a + 1
limit = int(sqrt(number)) + 1
size = number//2
primes = [True] * size
for i in range(1, size):
if primes[i]:
num = n(i)
for j in range(i+num, size, num):
primes[j] = False
for i, flag in enumerate(primes):
if flag: yield n(i)
def primes_between(lo, hi):
primes = list(primes_below(int(sqrt(hi))+1))
size = (hi - lo - (0 if hi%2 else 1))//2 + 1
n = lambda a: 2*a + lo + (0 if lo%2 else 1)
numbers = [True]*size
for i, prime in enumerate(primes):
if i == 0: continue
start = 0
while (n(start)%prime) != 0:
start += 1
for j in range(start, size, prime):
numbers[j] = False
for i, flag in enumerate(numbers):
if flag: yield n(i)
这会在 0.64 秒内打印出 1e6 和 5e6 之间的素数列表
尝试在我的 GPU 上使用您的脚本在 5 分钟内没有完成。对于一个小 10 倍的问题:1e5 和 5e5 之间的素数,大约需要 29 秒。
修改脚本,使 GPU 中的每个进程都将一个奇数(测试偶数没有意义)除以一系列预先计算的素数,直到上数的平方根,如果素数是大于数字本身的平方根,它在 0.50 秒内完成相同的任务。这是一个进步!
代码如下:
import numpy as np
import pyopencl as cl
import pyopencl.algorithm
import pyopencl.array
def primes_between_using_cl(lo, hi):
primes = list(primes_below(int(sqrt(hi))+1))
numbers_h = np.arange( lo + (0 if lo&1 else 1),
hi + (0 if hi&1 else 1),
2,
dtype=np.int32)
size = (hi - lo - (0 if hi%2 else 1))//2 + 1
code = """\
__kernel
void is_prime( __global const int *primes,
__global int *numbers) {
int gid = get_global_id(0);
int num = numbers[gid];
int max = (int) (sqrt((float)num) + 1.0);
for (; *primes; ++primes) {
if (*primes <= max && num % *primes == 0) {
numbers[gid] = 0;
return;
}
}
}
"""
platforms = cl.get_platforms()
ctx = cl.Context(dev_type=cl.device_type.ALL,
properties=[(cl.context_properties.PLATFORM, platforms[0])])
queue = cl.CommandQueue(ctx)
prg = cl.Program(ctx, code).build()
numbers_d = cl.array.to_device(queue, numbers_h)
primes_d = cl.array.to_device(queue,
np.array(primes[1:], # don't need 2
dtype=np.int32))
prg.is_prime(queue, (size, ), None, primes_d.data, numbers_d.data)
array, length = cl.algorithm.copy_if(numbers_d, "ary[i]>0")[:2]
yield from array.get()[:length.get()]
以下代码是完整 python 程序的示例,通常包括:
- 导入语句
- 函数定义
main()
函数if __name__ == "__main__":
部分。
希望能帮到您解决问题。
import pyprimes
from math import sqrt
import numpy as np
import pyopencl as cl
import pyopencl.algorithm
import pyopencl.array
def primes_below(number):
"""Generate a list of prime numbers below a specified `number`"""
n = lambda a: 2 if a==0 else 2*a + 1
limit = int(sqrt(number)) + 1
size = number//2
primes = [True] * size
for i in range(1, size):
if primes[i]:
num = n(i)
if num > limit: break
for j in range(i+num, size, num):
primes[j] = False
for i, flag in enumerate(primes):
if flag:
yield n(i)
def primes_between(lo, hi):
"""Generate a list of prime numbers betwenn `lo` and `hi` numbers"""
primes = list(primes_below(int(sqrt(hi))+1))
size = (hi - lo - (0 if hi%2 else 1))//2 + 1
n = lambda a: 2*a + lo + (0 if lo%2 else 1)
numbers = [True]*size
for i, prime in enumerate(primes):
if i == 0: continue # avoid dividing by 2
nlo = n(0)
# slower # start = prime * (nlo//prime + 1) if nlo%prime else 0
start = 0
while (n(start)%prime) != 0:
start += 1
for j in range(start, size, prime):
numbers[j] = False
for i, flag in enumerate(numbers):
if flag:
yield n(i)
def primes_between_using_cl(lo, hi):
"""Generate a list of prime numbers betwenn a lo and hi numbers
this is a parallel algorithm using pyopencl"""
primes = list(primes_below(int(sqrt(hi))+1))
size_primes_h = np.array( (len(primes)-1, ), dtype=np.int32)
numbers_h = np.arange( lo + (0 if lo&1 else 1),
hi + (0 if hi&1 else 1),
2,
dtype=np.int32)
size = (hi - lo - (0 if hi%2 else 1))//2 + 1
code = """\
__kernel
void is_prime( __global const int *primes,
__global int *numbers) {
int gid = get_global_id(0);
int num = numbers[gid];
int max = (int) (sqrt((float)num) + 1.0);
for (; *primes; ++primes) {
if (*primes > max) break;
if (num % *primes == 0) {
numbers[gid] = 0;
return;
}
}
}
"""
platforms = cl.get_platforms()
ctx = cl.Context(dev_type=cl.device_type.ALL,
properties=[(cl.context_properties.PLATFORM, platforms[0])])
queue = cl.CommandQueue(ctx)
prg = cl.Program(ctx, code).build()
numbers_d = cl.array.to_device(queue, numbers_h)
primes_d = cl.array.to_device(queue, np.array(primes[1:], dtype=np.int32))
prg.is_prime(queue, (size, ), None, primes_d.data, numbers_d.data)
array, length = cl.algorithm.copy_if(numbers_d, "ary[i]>0")[:2]
yield from array.get()[:length.get()]
def test(f, lo, hi):
"""Test that all prime numbers are generated by comparing with the
output of the library `pyprimes`"""
a = filter(lambda p: p>lo, pyprimes.primes_below(hi))
b = f(lo, hi)
result = True
for p, q in zip (a, b):
if p != q:
print(p, q)
result = False
return result
def main():
lower = 1000
upper = 5000
print("The prime numbers between {} and {}, are:".format(lower,upper))
print()
for p in primes_between_using_cl(lower, upper):
print(p, end=' ')
print()
if __name__ == '__main__':
main()