减少cython并行中的数组
Reduction of array in cython parallel
我有一个数组需要包含不同事物的总和,因此我想对它的每个元素进行归约。
这是代码:
cdef int *a=<int *>malloc(sizeof(int) * 3)
for i in range(3):
a[i]=1*i
cdef int *b
for i in prange(1000,nogil=True,num_threads=10):
b=res() #res returns an array initialized to 1s
with gil: #if commented this line gives erroneous results
for k in range(3):
a[k]+=b[k]
for i in range(3):
print a[i]
直到有 with gil 代码运行正常,否则会给出错误的结果。
如何在不使用 gil 的情况下处理数组的每个元素的减少,因为我认为 gil 会阻塞其他线程
归约在实践中通常的工作方式是为每个线程单独求和,然后在最后将它们相加。您可以使用
之类的内容手动执行此操作
cdef int *b
cdef int *a_local # version of a that is duplicated by each thread
cdef int i,j,k
# set up as before
cdef int *a=<int *>malloc(sizeof(int) * 3)
for i in range(3):
a[i]=1*i
# multithreaded from here
with nogil, parallel(num_threads=10):
# setup and initialise a_local on each thread
a_local = <int*>malloc(sizeof(int)*3)
for k in range(3):
a_local[k] = 0
for i in prange(1000):
b=res() # Note - you never free b
# this is likely a memory leak....
for j in range(3):
a_local[j]+=b[j]
# finally at the end add them all together.
# this needs to be done `with gil:` to avoid race conditions
# but it isn't a problem
# because it's only a small amount of work being done
with gil:
for k in range(3):
a[k] += a_local[k]
free(a_local)
我有一个数组需要包含不同事物的总和,因此我想对它的每个元素进行归约。 这是代码:
cdef int *a=<int *>malloc(sizeof(int) * 3)
for i in range(3):
a[i]=1*i
cdef int *b
for i in prange(1000,nogil=True,num_threads=10):
b=res() #res returns an array initialized to 1s
with gil: #if commented this line gives erroneous results
for k in range(3):
a[k]+=b[k]
for i in range(3):
print a[i]
直到有 with gil 代码运行正常,否则会给出错误的结果。 如何在不使用 gil 的情况下处理数组的每个元素的减少,因为我认为 gil 会阻塞其他线程
归约在实践中通常的工作方式是为每个线程单独求和,然后在最后将它们相加。您可以使用
之类的内容手动执行此操作cdef int *b
cdef int *a_local # version of a that is duplicated by each thread
cdef int i,j,k
# set up as before
cdef int *a=<int *>malloc(sizeof(int) * 3)
for i in range(3):
a[i]=1*i
# multithreaded from here
with nogil, parallel(num_threads=10):
# setup and initialise a_local on each thread
a_local = <int*>malloc(sizeof(int)*3)
for k in range(3):
a_local[k] = 0
for i in prange(1000):
b=res() # Note - you never free b
# this is likely a memory leak....
for j in range(3):
a_local[j]+=b[j]
# finally at the end add them all together.
# this needs to be done `with gil:` to avoid race conditions
# but it isn't a problem
# because it's only a small amount of work being done
with gil:
for k in range(3):
a[k] += a_local[k]
free(a_local)