Julia vs Mathematica：数值积分性能

Question

Julia 绝对是新人，这里有一个问题要问你。

我正在通过移植我的一些 Mathematica 和 Python 代码（主要是物理学中的科学计算等）来自学一些 Julia，然后看看是什么。到目前为止，一切都很顺利。而且很快。直到现在。

现在，我正在模拟一个基本的锁定放大器，本质上，它接收一个可能非常复杂的时间相关信号 Uin(t)，并产生一个输出 Uout(t)，锁相在某个参考频率fref（即突出显示Uin(t)的分量，它与参考正弦波有一定的相位关系）。 描述无关紧要，重要的是它基本上是通过计算积分来实现的（为了清楚起见，我实际上在这里省略了相位） :

所以，我在 Mathematica 和 Julia 中着手并测试了这个：我定义了一个模型 Uin(t)，传递了一些参数值，然后在时间 t = 0 构建了一个 Uout(t) 的数组，范围为 fref。

Julia: 我使用 QuadGK 包进行数值积分。

T = 0.1
f = 100.
Uin(t) = sin(2pi * f * t) + sin(2pi * 2f *t)
Uout(t, fref)::Float64 = quadgk(s -> Uin(s) * sin(2pi * fref * s), t-T, t, rtol=1E-3)[1]/T
frng = 80.:1.:120.
print(@time broadcast(Uout, 0., frng))

Mathematica

T = 0.1;
f = 100.;
Uin[t_] := Sin[2 π f t] + Sin[2 π 2 f t]
Uout[t_, fref_] := NIntegrate[Sin[2 π fref s] Uin[s], {s, t - T, t}]/T
frng = Table[i, {i, 80, 120, 1}];
Timing[Table[Uout[0, fr], {fr, frng}]]

结果：

Julia 在使用电池供电的 i7-5xxx 笔记本电脑上将操作计时为 45 到 55 秒之间的任何时间，很多，而 Mathematica 在约 2 秒内完成。差异是深远的，老实说，很难相信。我知道 Mathematica 的内核中有一些非常可爱和精致的算法，但 Julia 就是 Julia。所以，问题是：什么给了？

P.S.: 设置 f 和 T 为 const 确实将 Julia 的时间减少到 ~8-10 秒，但是f和T在实际程序中不能是const。除此之外，我还有什么明显的遗漏吗？

编辑 2020 年 2 月 2 日：

速度变慢似乎是由于当值接近于零时算法试图降低精度，例如见下文：对于 fref = 95，计算需要 1 整秒（！），而对于相邻的频率值，它会立即计算（返回的结果是 (res, error) 的元组）。似乎 quadgk 函数在非常小的值处停滞不前）：

  0.000124 seconds (320 allocations: 7.047 KiB)
fref = 94.0 (-0.08637214864144352, 9.21712218998258e-6)

  1.016830 seconds (6.67 M allocations: 139.071 MiB, 14.35% gc time)
fref = 95.0 (-6.088184966010742e-16, 1.046186419361636e-16)

  0.000124 seconds (280 allocations: 6.297 KiB)
fref = 96.0 (0.1254003757465191, 0.00010132083518769636)

注：这与我要求生产的公差无关。此外，默认情况下，Mathematica 通常会达到机器精度公差，而在接近零时会有所放缓，并且 numpy/scipy 只是飞过整个过程，但产生的结果不如 Mathematica 精确（在默认设置下；看起来不多进入这个）。

Answer 1

我看到您的代码的第一个问题是它的类型不稳定。这是 因为您正在使用全局变量 造成的（请参阅 Julia Performance Tips 处的第一个性能提示）：编译器无法知道您在函数中使用的 f 和 T 的类型，因此它无法进行高效的编译。这也是为什么当你将它们标记为 const 时，性能会提高：现在编译器可以保证它们不会改变它们的类型，因此它可以高效地编译你的两个函数。

如何看出你的代码不稳定

如果你运行你的第一个宏函数 @code_warntype 像这样：

@code_warntype Uin(0.1,f)

您将看到如下输出：

julia> @code_warntype Uin(0.1)
Variables
  #self#::Core.Compiler.Const(Uin, false)
  t::Float64

Body::Any
1 ─ %1 = (2.0 * Main.pi)::Core.Compiler.Const(6.283185307179586, false)
│   %2 = (%1 * Main.f * t)::Any
│   %3 = Main.sin(%2)::Any
│   %4 = (2.0 * Main.pi)::Core.Compiler.Const(6.283185307179586, false)
│   %5 = (2.0 * Main.f)::Any
│   %6 = (%4 * %5 * t)::Any
│   %7 = Main.sin(%6)::Any
│   %8 = (%3 + %7)::Any
└──      return %8

所有这些Anys告诉你编译在任何一步都不知道输出的类型。

如何修复

您可以重新定义函数以将 f 和 T 作为变量接收：

Uin(t,f) = sin(2.0pi * f * t) + sin(2.0pi * 2.0f *t)
Uout(t, fref,f,T)::Float64 = quadgk(s -> Uin(s,f) * sin(2pi * fref * s), t-T, t, rtol=1E-3)[1]/T

通过这些重新定义，您的代码运行变得更快了。如果您尝试使用 @code_warntype 检查它们，您会发现现在编译器正确地推断出所有内容的类型。

要进一步提高性能，您可以查看 Julia Performance Tips

特别是，通常建议的衡量性能而不是使用 @time 的方法是包 BenchmarkTools 中的 @btime。之所以如此，是因为当运行ning @time 你也在测量编译时间（另一种选择是运行 @time 两次 - 第二次测量将是正确的，因为所有函数都有机会编译).

Answer 2

您可以采取多种措施进一步加快速度。改变积分的顺序确实有点帮助，使用 Float32 代替 Float64 做了一个小改进，使用 @fastmath 做了进一步的小改进。也可以使用 SLEEFPirates.sin_fast

using QuadGK, ChangePrecision

@changeprecision Float32 begin
    T = 0.1
    f = 100.
    @inline @fastmath Uin(t,f) = sin(2pi * f * t) + sin(2pi * 2f *t)
    @fastmath Uout(t, fref,f,T) = first(quadgk(s -> Uin(s,f) * sin(2pi * fref * s), t-T, t, rtol=1e-2, order=10))/T

    frng = 80.:1.:120.
    @time Uout.(0., frng, f, T)
end

Answer 3

你的问题与容错的选择有关。 1e-3 的相对误差听起来还不错，但实际上是在积分接近于零的时候。特别是，当 fref = 80.0（和 85、90、95，不是 100、105 等）时会发生这种情况：

julia> Uout(0.0, 80.0, f, T)
1.2104987553880609e-16

引用quadgk的文档字符串：

(Note that it is useful to specify a positive atol in cases where norm(I) may be zero.)

让我们尝试设置一个绝对容差，例如1e-6，然后进行比较。首先是代码（使用来自@ARamirez 的代码）：

Uin(t, f) = sin(2π * f * t) + sin(4π * f * t)

function Uout(t, fref, f , T)
    quadgk(s -> Uin(s, f) * sin(2π * fref * s), t-T, t, rtol=1e-3)[1]/T
end
function Uout_new(t, fref, f , T) # with atol
    quadgk(s -> Uin(s, f) * sin(2π * fref * s), t-T, t, rtol=1e-3, atol=1e-6)[1]/T
end

然后进行基准测试（为此使用 BenchmarkTools）

using BenchmarkTools
T = 0.1
f = 100.0
freqs = 80.0:1.0:120.0

@btime Uout.(0.0, $freqs, $f, $T);
6.302 s (53344283 allocations: 1.09 GiB)

@btime Uout_new.(0.0, $freqs, $f, $T);
1.270 ms (11725 allocations: 262.08 KiB)

好的，那快了 5000 倍。可以吗？

Julia vs Mathematica：数值积分性能

Julia vs Mathematica: Numerical integration performance

wolfram-mathematica

numerical-integration

julia

如何看出你的代码不稳定

如何修复