Julia CUDA - 减少矩阵列
Julia CUDA - Reduce matrix columns
考虑以下内核,它沿二维矩阵行减少
function row_sum!(x, ncol, out)
"""out = sum(x, dims=2)"""
row_idx = (blockIdx().x-1) * blockDim().x + threadIdx().x
for i = 1:ncol
@inbounds out[row_idx] += x[row_idx, i]
end
return
end
N = 1024
x = CUDA.rand(Float64, N, 2*N)
out = CUDA.zeros(Float64, N)
@cuda threads=256 blocks=4 row_sum!(x, size(x)[2], out)
isapprox(out, sum(x, dims=2)) # true
除了沿着列(二维矩阵)进行缩减外,我该如何编写类似的内核?特别是,我如何获取每一列的索引,类似于我们如何使用 row_idx
?
获取每一行的索引
代码如下:
function col_sum!(x, nrow, out)
"""out = sum(x, dims=1)"""
col_idx = (blockIdx().x-1) * blockDim().x + threadIdx().x
for i = 1:nrow
@inbounds out[col_idx] += x[i, col_idx]
end
return
end
N = 1024
x = CUDA.rand(Float64, N, 2N)
out = CUDA.zeros(Float64, 2N)
@cuda threads=256 blocks=8 col_sum!(x, size(x, 1), out)
这是测试:
julia> isapprox(out, vec(sum(x, dims=1)))
true
如您所见,结果向量的大小现在是 2N
而不是 N
,因此我们必须相应地调整 blocks
的数量(即乘以 2
现在我们有 8
而不是 4
)
可以在这里找到更多资料:https://juliagpu.gitlab.io/CUDA.jl/tutorials/introduction/
考虑以下内核,它沿二维矩阵行减少
function row_sum!(x, ncol, out)
"""out = sum(x, dims=2)"""
row_idx = (blockIdx().x-1) * blockDim().x + threadIdx().x
for i = 1:ncol
@inbounds out[row_idx] += x[row_idx, i]
end
return
end
N = 1024
x = CUDA.rand(Float64, N, 2*N)
out = CUDA.zeros(Float64, N)
@cuda threads=256 blocks=4 row_sum!(x, size(x)[2], out)
isapprox(out, sum(x, dims=2)) # true
除了沿着列(二维矩阵)进行缩减外,我该如何编写类似的内核?特别是,我如何获取每一列的索引,类似于我们如何使用 row_idx
?
代码如下:
function col_sum!(x, nrow, out)
"""out = sum(x, dims=1)"""
col_idx = (blockIdx().x-1) * blockDim().x + threadIdx().x
for i = 1:nrow
@inbounds out[col_idx] += x[i, col_idx]
end
return
end
N = 1024
x = CUDA.rand(Float64, N, 2N)
out = CUDA.zeros(Float64, 2N)
@cuda threads=256 blocks=8 col_sum!(x, size(x, 1), out)
这是测试:
julia> isapprox(out, vec(sum(x, dims=1)))
true
如您所见,结果向量的大小现在是 2N
而不是 N
,因此我们必须相应地调整 blocks
的数量(即乘以 2
现在我们有 8
而不是 4
)
可以在这里找到更多资料:https://juliagpu.gitlab.io/CUDA.jl/tutorials/introduction/