在 Julia 中使用 StaticArrays.jl 进行性能分配和复制
Performance assigning and copying with StaticArrays.jl in Julia
我正在考虑使用包 StaticArrays.jl 来增强我的代码的性能。但是,我只使用数组来存储计算变量,并在设置某些条件后稍后使用它们。因此,我将 SizedVector 类型与法线向量进行了基准测试,但我不明白下面的代码。我还尝试了 StaticVector 并使用了 Setfield.jl.
周围的工作
using StaticArrays, BenchmarkTools, Setfield
function copySized(n::Int64)
v = SizedVector{n, Int64}(zeros(n))
w = Vector{Int64}(undef, n)
for i in eachindex(v)
v[i] = i
end
for i in eachindex(v)
w[i] = v[i]
end
end
function copyStatic(n::Int64)
v = @SVector zeros(n)
w = Vector{Int64}(undef, n)
for i in eachindex(v)
@set v[i] = i
end
for i in eachindex(v)
w[i] = v[i]
end
end
function copynormal(n::Int64)
v = zeros(n)
w = Vector{Int64}(undef, n)
for i in eachindex(v)
v[i] = i
end
for i in eachindex(v)
w[i] = v[i]
end
end
n = 10
@btime copySized($n)
@btime copyStatic($n)
@btime copynormal($n)
3.950 μs (42 allocations: 2.08 KiB)
5.417 μs (98 allocations: 4.64 KiB)
78.822 ns (2 allocations: 288 bytes)
为什么 SizedVector 的情况确实有更多的分配,因此性能更差?我没有正确使用 SizedVector 吗?它不应该至少具有与普通数组相同的性能吗?
提前致谢。
交叉 post 共 Julia Discourse
@phipsgabler 是对的!当大小在编译时静态已知时,静态大小的数组具有性能优势。但是,我的数组是动态调整大小的,大小 n 是运行时变量。
改变这个会产生更合理的结果:
using StaticArrays, BenchmarkTools, Setfield
function copySized()
v = SizedVector{10, Float64}(zeros(10))
w = Vector{Float64}(undef, 10*2)
for i in eachindex(v)
v[i] = rand()
end
for i in eachindex(v)
j = i+floor(Int64, 10/4)
w[j] = v[i]
end
end
function copyStatic()
v = @SVector zeros(10)
w = Vector{Int64}(undef, 10*2)
for i in eachindex(v)
@set v[i] = rand()
end
for i in eachindex(v)
j = i+floor(Int64, 10/4)
w[j] = v[i]
end
end
function copynormal()
v = zeros(10)
w = Vector{Float64}(undef, 10*2)
for i in eachindex(v)
v[i] = rand()
end
for i in eachindex(v)
j = i+floor(Int64, 10/4)
w[j] = v[i]
end
end
@btime copySized()
@btime copyStatic()
@btime copynormal()
110.162 ns (3 allocations: 512 bytes)
48.133 ns (1 allocation: 224 bytes)
92.045 ns (2 allocations: 368 bytes)
我觉得这是苹果与橙子的比较(大小应该静态存储在类型中)。更多说明性代码可能如下所示:
function copySized(::Val{n}) where n
v = SizedVector{n}(1:n)
w = Vector{Int64}(undef, n)
w .= v
end
function copyStatic(::Val{n}) where n
v = SVector{n}(1:n)
w = Vector{Int64}(undef, n)
w .= v
end
function copynormal(n)
v = [1:n;]
w = Vector{Int64}(undef, n)
w .= v
end
现在benchamrks:
julia> n = 10
10
julia> @btime copySized(Val{$n}());
248.138 ns (1 allocation: 144 bytes)
julia> @btime copyStatic(Val{$n}());
251.507 ns (1 allocation: 144 bytes)
julia> @btime copynormal($n);
77.940 ns (2 allocations: 288 bytes)
julia>
julia>
julia> n = 1000
1000
julia> @btime copySized(Val{$n}());
840.000 ns (2 allocations: 7.95 KiB)
julia> @btime copyStatic(Val{$n}());
830.769 ns (2 allocations: 7.95 KiB)
julia> @btime copynormal($n);
1.100 μs (2 allocations: 15.88 KiB)
我正在考虑使用包 StaticArrays.jl 来增强我的代码的性能。但是,我只使用数组来存储计算变量,并在设置某些条件后稍后使用它们。因此,我将 SizedVector 类型与法线向量进行了基准测试,但我不明白下面的代码。我还尝试了 StaticVector 并使用了 Setfield.jl.
周围的工作using StaticArrays, BenchmarkTools, Setfield
function copySized(n::Int64)
v = SizedVector{n, Int64}(zeros(n))
w = Vector{Int64}(undef, n)
for i in eachindex(v)
v[i] = i
end
for i in eachindex(v)
w[i] = v[i]
end
end
function copyStatic(n::Int64)
v = @SVector zeros(n)
w = Vector{Int64}(undef, n)
for i in eachindex(v)
@set v[i] = i
end
for i in eachindex(v)
w[i] = v[i]
end
end
function copynormal(n::Int64)
v = zeros(n)
w = Vector{Int64}(undef, n)
for i in eachindex(v)
v[i] = i
end
for i in eachindex(v)
w[i] = v[i]
end
end
n = 10
@btime copySized($n)
@btime copyStatic($n)
@btime copynormal($n)
3.950 μs (42 allocations: 2.08 KiB)
5.417 μs (98 allocations: 4.64 KiB)
78.822 ns (2 allocations: 288 bytes)
为什么 SizedVector 的情况确实有更多的分配,因此性能更差?我没有正确使用 SizedVector 吗?它不应该至少具有与普通数组相同的性能吗?
提前致谢。
交叉 post 共 Julia Discourse
@phipsgabler 是对的!当大小在编译时静态已知时,静态大小的数组具有性能优势。但是,我的数组是动态调整大小的,大小 n 是运行时变量。
改变这个会产生更合理的结果:
using StaticArrays, BenchmarkTools, Setfield
function copySized()
v = SizedVector{10, Float64}(zeros(10))
w = Vector{Float64}(undef, 10*2)
for i in eachindex(v)
v[i] = rand()
end
for i in eachindex(v)
j = i+floor(Int64, 10/4)
w[j] = v[i]
end
end
function copyStatic()
v = @SVector zeros(10)
w = Vector{Int64}(undef, 10*2)
for i in eachindex(v)
@set v[i] = rand()
end
for i in eachindex(v)
j = i+floor(Int64, 10/4)
w[j] = v[i]
end
end
function copynormal()
v = zeros(10)
w = Vector{Float64}(undef, 10*2)
for i in eachindex(v)
v[i] = rand()
end
for i in eachindex(v)
j = i+floor(Int64, 10/4)
w[j] = v[i]
end
end
@btime copySized()
@btime copyStatic()
@btime copynormal()
110.162 ns (3 allocations: 512 bytes)
48.133 ns (1 allocation: 224 bytes)
92.045 ns (2 allocations: 368 bytes)
我觉得这是苹果与橙子的比较(大小应该静态存储在类型中)。更多说明性代码可能如下所示:
function copySized(::Val{n}) where n
v = SizedVector{n}(1:n)
w = Vector{Int64}(undef, n)
w .= v
end
function copyStatic(::Val{n}) where n
v = SVector{n}(1:n)
w = Vector{Int64}(undef, n)
w .= v
end
function copynormal(n)
v = [1:n;]
w = Vector{Int64}(undef, n)
w .= v
end
现在benchamrks:
julia> n = 10
10
julia> @btime copySized(Val{$n}());
248.138 ns (1 allocation: 144 bytes)
julia> @btime copyStatic(Val{$n}());
251.507 ns (1 allocation: 144 bytes)
julia> @btime copynormal($n);
77.940 ns (2 allocations: 288 bytes)
julia>
julia>
julia> n = 1000
1000
julia> @btime copySized(Val{$n}());
840.000 ns (2 allocations: 7.95 KiB)
julia> @btime copyStatic(Val{$n}());
830.769 ns (2 allocations: 7.95 KiB)
julia> @btime copynormal($n);
1.100 μs (2 allocations: 15.88 KiB)