在 Julia 中使用 StaticArrays.jl 进行性能分配和复制

Performance assigning and copying with StaticArrays.jl in Julia

我正在考虑使用包 StaticArrays.jl 来增强我的代码的性能。但是,我只使用数组来存储计算变量,并在设置某些条件后稍后使用它们。因此,我将 SizedVector 类型与法线向量进行了基准测试,但我不明白下面的代码。我还尝试了 StaticVector 并使用了 Setfield.jl.

周围的工作
using StaticArrays, BenchmarkTools, Setfield
function copySized(n::Int64)
    v = SizedVector{n, Int64}(zeros(n))
    w = Vector{Int64}(undef, n)
    for i in eachindex(v)
        v[i] = i
    end
    for i in eachindex(v)
        w[i] = v[i]
    end
end
function copyStatic(n::Int64)
    v = @SVector zeros(n)
    w = Vector{Int64}(undef, n)
    for i in eachindex(v)
        @set v[i] = i
    end
    for i in eachindex(v)
        w[i] = v[i]
    end
end
function copynormal(n::Int64)
    v = zeros(n)
    w = Vector{Int64}(undef, n)
    for i in eachindex(v)
        v[i] = i
    end
    for i in eachindex(v)
        w[i] = v[i]
    end
end
n = 10
@btime copySized($n)
@btime copyStatic($n)
@btime copynormal($n)

3.950 μs (42 allocations: 2.08 KiB)
5.417 μs (98 allocations: 4.64 KiB) 
78.822 ns (2 allocations: 288 bytes)

为什么 SizedVector 的情况确实有更多的分配,因此性能更差?我没有正确使用 SizedVector 吗?它不应该至少具有与普通数组相同的性能吗?

提前致谢。

交叉 post 共 Julia Discourse

@phipsgabler 是对的!当大小在编译时静态已知时,静态大小的数组具有性能优势。但是,我的数组是动态调整大小的,大小 n 是运行时变量。

改变这个会产生更合理的结果:

using StaticArrays, BenchmarkTools, Setfield
function copySized()
    v = SizedVector{10, Float64}(zeros(10))
    w = Vector{Float64}(undef, 10*2)
    for i in eachindex(v)
        v[i] = rand()
    end
    for i in eachindex(v)
        j = i+floor(Int64, 10/4)
        w[j] = v[i]
    end
end
function copyStatic()
    v = @SVector zeros(10)
    w = Vector{Int64}(undef, 10*2)
    for i in eachindex(v)
       @set v[i] = rand()
    end
    for i in eachindex(v)
        j = i+floor(Int64, 10/4)
        w[j] = v[i]
    end
end
function copynormal()
    v = zeros(10)
    w = Vector{Float64}(undef, 10*2)
    for i in eachindex(v)
        v[i] = rand()
    end
    for i in eachindex(v)
        j = i+floor(Int64, 10/4)
        w[j] = v[i]
    end
end
@btime copySized()
@btime copyStatic()
@btime copynormal()

110.162 ns (3 allocations: 512 bytes)
48.133 ns (1 allocation: 224 bytes)
92.045 ns (2 allocations: 368 bytes)

我觉得这是苹果与橙子的比较(大小应该静态存储在类型中)。更多说明性代码可能如下所示:

function copySized(::Val{n}) where n
    v = SizedVector{n}(1:n)
    w = Vector{Int64}(undef, n)
    w .= v
end
function copyStatic(::Val{n}) where n
    v =  SVector{n}(1:n)
    w = Vector{Int64}(undef, n)
    w .= v
end
function copynormal(n) 
    v = [1:n;]
    w = Vector{Int64}(undef, n)
    w .= v
end

现在benchamrks:

julia> n = 10
10

julia> @btime copySized(Val{$n}());
  248.138 ns (1 allocation: 144 bytes)

julia> @btime copyStatic(Val{$n}());
  251.507 ns (1 allocation: 144 bytes)

julia> @btime copynormal($n);
  77.940 ns (2 allocations: 288 bytes)

julia>

julia>

julia> n = 1000
1000

julia> @btime copySized(Val{$n}());
  840.000 ns (2 allocations: 7.95 KiB)

julia> @btime copyStatic(Val{$n}());
  830.769 ns (2 allocations: 7.95 KiB)

julia> @btime copynormal($n);
  1.100 μs (2 allocations: 15.88 KiB)