显式定义变量与访问数组
Defining variables explicitly vs accessing arrays
我正在实施具有自适应步长 (RK45) 的 Runge-Kutta-Fehlberg 方法。我在笔记本中用
定义并调用我的 Butcher tableau
module FehlbergTableau
using StaticArrays
export A, B, CH, CT
A = @SVector [ 0 , 2/9 , 1/3 , 3/4 , 1 , 5/6 ]
B = @SMatrix [ 0 0 0 0 0
2/9 0 0 0 0
1/12 1/4 0 0 0
69/128 -243/128 135/64 0 0
-17/12 27/4 -27/5 16/15 0
65/432 -5/16 13/16 4/27 5/144 ]
CH = @SVector [ 47/450 , 0 , 12/25 , 32/225 , 1/30 , 6/25 ]
CT = @SVector [ -1/150 , 0 , 3/100 , -16/75 , -1/20 , 6/25 ]
end
using .FehlbergTableau
如果我直接将 RK45 的算法编码为
function infinitesimal_flow(A::SVector{6,Float64}, B::SMatrix{6,5,Float64}, CH::SVector{6,Float64}, CT::SVector{6,Float64}, t0::Float64, Δt::Float64, J∇H::Function, x0::SVector{N,Float64}) where N
k1 = Δt * J∇H( t0 + Δt*A[1], x0 )
k2 = Δt * J∇H( t0 + Δt*A[2], x0 + B[2,1]*k1 )
k3 = Δt * J∇H( t0 + Δt*A[3], x0 + B[3,1]*k1 + B[3,2]*k2 )
k4 = Δt * J∇H( t0 + Δt*A[4], x0 + B[4,1]*k1 + B[4,2]*k2 + B[4,3]*k3 )
k5 = Δt * J∇H( t0 + Δt*A[5], x0 + B[5,1]*k1 + B[5,2]*k2 + B[5,3]*k3 + B[5,4]*k4 )
k6 = Δt * J∇H( t0 + Δt*A[6], x0 + B[6,1]*k1 + B[6,2]*k2 + B[6,3]*k3 + B[6,4]*k4 + B[6,5]*k5 )
TE = CT[1]*k1 + CT[2]*k2 + CT[3]*k3 + CT[4]*k4 + CT[5]*k5 + CT[6]*k6
xt = x0 + CH[1]*k1 + CH[2]*k2 + CH[3]*k3 + CH[4]*k4 + CH[5]*k5 + CH[6]*k6
norm(TE), xt
end
并将其与更紧凑的实现进行比较
function infinitesimal_flow_2(A::SVector{6,Float64}, B::SMatrix{6,5,Float64}, CH::SVector{6,Float64}, CT::SVector{6,Float64}, t0::Float64,Δt::Float64,J∇H::Function, x0::SVector{N,Float64}) where N
k = MMatrix{N,6}(0.0I)
TE = zero(x0); xt = x0
for i=1:6
# EDIT: this is wrong! there should be a new variable here, as pointed
# out by Lutz Lehmann: xs = x0
for j=1:i-1
# xs += B[i,j] * k[:,j]
x0 += B[i,j] * k[:,j] #wrong
end
k[:,i] = Δt * J∇H(t0 + Δt*A[i], x0)
TE += CT[i]*k[:,i]
xt += CH[i]*k[:,i]B[i,j] * k[:,j]
end
norm(TE), xt
end
那么显式定义变量的第一个函数要快得多:
J∇H(t::Float64, X::SVector{N,Float64}) where N = @SVector [ -X[2]^2, X[1] ]
x0 = SVector{2}([0.0, 1.0])
infinitesimal_flow(A, B, CH, CT, 0.0, 1e-2, J∇H, x0)
infinitesimal_flow_2(A, B, CH, CT, 0.0, 1e-2, J∇H, x0)
@btime infinitesimal_flow($A, $B, $CH, $CT, 0.0, 1e-2, $J∇H, $x0)
>> 19.387 ns (0 allocations: 0 bytes)
@btime infinitesimal_flow_2($A, $B, $CH, $CT, 0.0, 1e-2, $J∇H, $x0)
>> 50.985 ns (0 allocations: 0 bytes)
我找不到类型不稳定性或任何可以证明延迟合理的东西,对于更复杂的画面,我必须以循环形式使用算法。我做错了什么?
P.S.: infinitesimal_flow_2
中的瓶颈是行 k[:,i] = Δt * J∇H(t0 + Δt*A[i], x0)
.
RK方法的每个阶段直接从RK步骤的基点计算其评估点。这在第一种方法中是明确的。在第二种方法中,您必须在每个阶段重置点计算,例如 in
for i=1:6
xs = x0
for j=1:i-1
xs += B[i,j] * k[:,j]
end
k[:,i] = Δt * J∇H(t0 + Δt*A[i], xs)
...
步长计算中最轻微的错误都可能导致步长控制器发生灾难性的失误,迫使步长向零下降,从而使工作量急剧增加。一个例子是
我正在实施具有自适应步长 (RK45) 的 Runge-Kutta-Fehlberg 方法。我在笔记本中用
定义并调用我的 Butcher tableaumodule FehlbergTableau
using StaticArrays
export A, B, CH, CT
A = @SVector [ 0 , 2/9 , 1/3 , 3/4 , 1 , 5/6 ]
B = @SMatrix [ 0 0 0 0 0
2/9 0 0 0 0
1/12 1/4 0 0 0
69/128 -243/128 135/64 0 0
-17/12 27/4 -27/5 16/15 0
65/432 -5/16 13/16 4/27 5/144 ]
CH = @SVector [ 47/450 , 0 , 12/25 , 32/225 , 1/30 , 6/25 ]
CT = @SVector [ -1/150 , 0 , 3/100 , -16/75 , -1/20 , 6/25 ]
end
using .FehlbergTableau
如果我直接将 RK45 的算法编码为
function infinitesimal_flow(A::SVector{6,Float64}, B::SMatrix{6,5,Float64}, CH::SVector{6,Float64}, CT::SVector{6,Float64}, t0::Float64, Δt::Float64, J∇H::Function, x0::SVector{N,Float64}) where N
k1 = Δt * J∇H( t0 + Δt*A[1], x0 )
k2 = Δt * J∇H( t0 + Δt*A[2], x0 + B[2,1]*k1 )
k3 = Δt * J∇H( t0 + Δt*A[3], x0 + B[3,1]*k1 + B[3,2]*k2 )
k4 = Δt * J∇H( t0 + Δt*A[4], x0 + B[4,1]*k1 + B[4,2]*k2 + B[4,3]*k3 )
k5 = Δt * J∇H( t0 + Δt*A[5], x0 + B[5,1]*k1 + B[5,2]*k2 + B[5,3]*k3 + B[5,4]*k4 )
k6 = Δt * J∇H( t0 + Δt*A[6], x0 + B[6,1]*k1 + B[6,2]*k2 + B[6,3]*k3 + B[6,4]*k4 + B[6,5]*k5 )
TE = CT[1]*k1 + CT[2]*k2 + CT[3]*k3 + CT[4]*k4 + CT[5]*k5 + CT[6]*k6
xt = x0 + CH[1]*k1 + CH[2]*k2 + CH[3]*k3 + CH[4]*k4 + CH[5]*k5 + CH[6]*k6
norm(TE), xt
end
并将其与更紧凑的实现进行比较
function infinitesimal_flow_2(A::SVector{6,Float64}, B::SMatrix{6,5,Float64}, CH::SVector{6,Float64}, CT::SVector{6,Float64}, t0::Float64,Δt::Float64,J∇H::Function, x0::SVector{N,Float64}) where N
k = MMatrix{N,6}(0.0I)
TE = zero(x0); xt = x0
for i=1:6
# EDIT: this is wrong! there should be a new variable here, as pointed
# out by Lutz Lehmann: xs = x0
for j=1:i-1
# xs += B[i,j] * k[:,j]
x0 += B[i,j] * k[:,j] #wrong
end
k[:,i] = Δt * J∇H(t0 + Δt*A[i], x0)
TE += CT[i]*k[:,i]
xt += CH[i]*k[:,i]B[i,j] * k[:,j]
end
norm(TE), xt
end
那么显式定义变量的第一个函数要快得多:
J∇H(t::Float64, X::SVector{N,Float64}) where N = @SVector [ -X[2]^2, X[1] ]
x0 = SVector{2}([0.0, 1.0])
infinitesimal_flow(A, B, CH, CT, 0.0, 1e-2, J∇H, x0)
infinitesimal_flow_2(A, B, CH, CT, 0.0, 1e-2, J∇H, x0)
@btime infinitesimal_flow($A, $B, $CH, $CT, 0.0, 1e-2, $J∇H, $x0)
>> 19.387 ns (0 allocations: 0 bytes)
@btime infinitesimal_flow_2($A, $B, $CH, $CT, 0.0, 1e-2, $J∇H, $x0)
>> 50.985 ns (0 allocations: 0 bytes)
我找不到类型不稳定性或任何可以证明延迟合理的东西,对于更复杂的画面,我必须以循环形式使用算法。我做错了什么?
P.S.: infinitesimal_flow_2
中的瓶颈是行 k[:,i] = Δt * J∇H(t0 + Δt*A[i], x0)
.
RK方法的每个阶段直接从RK步骤的基点计算其评估点。这在第一种方法中是明确的。在第二种方法中,您必须在每个阶段重置点计算,例如 in
for i=1:6
xs = x0
for j=1:i-1
xs += B[i,j] * k[:,j]
end
k[:,i] = Δt * J∇H(t0 + Δt*A[i], xs)
...
步长计算中最轻微的错误都可能导致步长控制器发生灾难性的失误,迫使步长向零下降,从而使工作量急剧增加。一个例子是