显式定义变量与访问数组

Defining variables explicitly vs accessing arrays

我正在实施具有自适应步长 (RK45) 的 Runge-Kutta-Fehlberg 方法。我在笔记本中用

定义并调用我的 Butcher tableau
module FehlbergTableau
    using StaticArrays
    export A, B, CH, CT

    A = @SVector [ 0 , 2/9 , 1/3 , 3/4 , 1 , 5/6 ]

    B = @SMatrix [ 0          0         0        0        0
                   2/9        0         0        0        0
                   1/12       1/4       0        0        0
                   69/128    -243/128   135/64   0        0
                  -17/12      27/4     -27/5     16/15    0
                   65/432    -5/16      13/16    4/27     5/144 ]

    CH = @SVector [ 47/450 , 0 , 12/25 , 32/225 , 1/30 , 6/25 ]

    CT = @SVector [ -1/150 , 0 , 3/100 , -16/75 , -1/20 , 6/25 ]
end

using .FehlbergTableau

如果我直接将 RK45 的算法编码为

function infinitesimal_flow(A::SVector{6,Float64}, B::SMatrix{6,5,Float64}, CH::SVector{6,Float64}, CT::SVector{6,Float64}, t0::Float64, Δt::Float64, J∇H::Function, x0::SVector{N,Float64}) where N
    
    k1 = Δt * J∇H( t0 + Δt*A[1], x0 ) 
    k2 = Δt * J∇H( t0 + Δt*A[2], x0 + B[2,1]*k1 ) 
    k3 = Δt * J∇H( t0 + Δt*A[3], x0 + B[3,1]*k1 + B[3,2]*k2 )
    k4 = Δt * J∇H( t0 + Δt*A[4], x0 + B[4,1]*k1 + B[4,2]*k2 + B[4,3]*k3 )
    k5 = Δt * J∇H( t0 + Δt*A[5], x0 + B[5,1]*k1 + B[5,2]*k2 + B[5,3]*k3 + B[5,4]*k4 )
    k6 = Δt * J∇H( t0 + Δt*A[6], x0 + B[6,1]*k1 + B[6,2]*k2 + B[6,3]*k3 + B[6,4]*k4 + B[6,5]*k5 )
    
    TE = CT[1]*k1 + CT[2]*k2 + CT[3]*k3 + CT[4]*k4 + CT[5]*k5 + CT[6]*k6  
    xt = x0 + CH[1]*k1 + CH[2]*k2 + CH[3]*k3 + CH[4]*k4 + CH[5]*k5 + CH[6]*k6 
    
    norm(TE), xt
end                  

并将其与更紧凑的实现进行比较

function infinitesimal_flow_2(A::SVector{6,Float64}, B::SMatrix{6,5,Float64}, CH::SVector{6,Float64}, CT::SVector{6,Float64}, t0::Float64,Δt::Float64,J∇H::Function, x0::SVector{N,Float64}) where N
    
    k = MMatrix{N,6}(0.0I)
    TE = zero(x0); xt = x0
    
    for i=1:6
        # EDIT: this is wrong! there should be a new variable here, as pointed 
        # out by Lutz Lehmann: xs = x0
        for j=1:i-1
            # xs += B[i,j] * k[:,j]
            x0 += B[i,j] * k[:,j] #wrong
        end
        k[:,i] = Δt *  J∇H(t0 + Δt*A[i], x0) 

        TE += CT[i]*k[:,i]
        xt += CH[i]*k[:,i]B[i,j] * k[:,j]
    end
    norm(TE), xt
end

那么显式定义变量的第一个函数要快得多:

J∇H(t::Float64, X::SVector{N,Float64}) where N = @SVector [ -X[2]^2, X[1] ]
x0 = SVector{2}([0.0, 1.0])
infinitesimal_flow(A, B, CH, CT, 0.0, 1e-2, J∇H, x0)
infinitesimal_flow_2(A, B, CH, CT, 0.0, 1e-2, J∇H, x0)

@btime infinitesimal_flow($A, $B, $CH, $CT, 0.0, 1e-2, $J∇H, $x0)
>> 19.387 ns (0 allocations: 0 bytes)
@btime infinitesimal_flow_2($A, $B, $CH, $CT, 0.0, 1e-2, $J∇H, $x0)
>> 50.985 ns (0 allocations: 0 bytes)

我找不到类型不稳定性或任何可以证明延迟合理的东西,对于更复杂的画面,我必须以循环形式使用算法。我做错了什么?

P.S.: infinitesimal_flow_2 中的瓶颈是行 k[:,i] = Δt * J∇H(t0 + Δt*A[i], x0).

RK方法的每个阶段直接从RK步骤的基点计算其评估点。这在第一种方法中是明确的。在第二种方法中,您必须在每个阶段重置点计算,例如 in

    for i=1:6
        xs = x0
        for j=1:i-1
            xs += B[i,j] * k[:,j]
        end
        k[:,i] = Δt *  J∇H(t0 + Δt*A[i], xs) 
        ...

步长计算中最轻微的错误都可能导致步长控制器发生灾难性的失误,迫使步长向零下降,从而使工作量急剧增加。一个例子是