当函数传递给函数时，Julia 编译器似乎没有优化

Question

第二次编辑： This pull request github 将解决此问题。只要是运行ning Julia v0.5+，匿名函数就会和普通函数一样快。案子结了。

编辑：我已将问题和函数定义更新为更一般的情况。

举个简单的例子，当函数被传递给函数或函数在函数内定义时，Julia 编译器似乎没有优化。这让我感到惊讶，因为这在优化包中很常见。我是正确的还是在做一些愚蠢的事情？一个简单的例子如下：

f(a::Int, b::Int) = a - b    #A simple function

function g1(N::Int, fIn::Function)   #Case 1: Passing in a function
    z = 0
    for n = 1:N
        z += fIn(n, n)
    end
end

function g2(N::Int)   #Case 2: Function defined within a function
    fAnon = f
    z = 0
    for n = 1:N
        z += fAnon(n, n)
    end
    return(z)
end

function g3(N::Int)   #Case 3: Function not defined within function
    z = 0
    for n = 1:N
        z += f(n, n)
    end
    return(z)
end

然后我运行下面的代码对三种情况进行计时：

#Run the functions once
g1(10, f)   
g2(10)
g3(10)

@time g1(100000000, f)
@time g2(100000000)
@time g3(100000000)

时间是：

elapsed time: 5.285407555 seconds (3199984880 bytes allocated, 33.95% gc time)
elapsed time: 5.424531599 seconds (3199983728 bytes allocated, 32.59% gc time)
elapsed time: 2.473e-6 seconds (80 bytes allocated)

前两种情况需要大量内存分配和垃圾回收。谁能解释一下为什么？

Answer 1

所以一件有趣的事情是在 Julia 0.4 中使用 @code_warntype，它显示如下：

julia> @code_warntype g1(10, f)
Variables:
  N::Int64
  fIn::F
  z::Any
  #s1::Int64
  n::Int64

Body:
  begin  # none, line 2:
      z = 0 # line 3:
... snip ....
      z = z + (fIn::F)(n::Int64,n::Int64)::Any::Any

所以问题在于 f 的 return 类型的推断，它实际上可以是任何东西。问题（据我所知）是 Julia 为每种类型组合编译了一个方法。我们在这里为 any 函数生成了代码，所以任何东西都可以返回。如果 Function 是 return 类型的参数，那就太好了，因为那样我们就可以做一些更聪明的事情，比如 Function{T<:Any,Int}.

我的解决方案是将其更改为 z += fIn(n, n)::Int，这允许 z 始终是 Int，但我仍然看到

(top(typeassert))((fIn::F)(n::Int64,n::Int64)::Any,Int)::Int64

在 @code_warntype 输出中，这是有道理的，因为它确实仍然是一个 Any，我只是确保它不会污染其余部分。但我认为它仍然必须生成代码来检查它实际上是一个 Int。让我们调用这个新版本 g1A:

julia> @time g1(1000000, f)
elapsed time: 0.124437357 seconds (30 MB allocated, 2.82% gc time in 1 pauses with 0 full sweep)
elapsed time: 0.121653131 seconds (30 MB allocated, 2.51% gc time in 2 pauses with 0 full sweep)
elapsed time: 0.120805345 seconds (30 MB allocated, 1.17% gc time in 1 pauses with 0 full sweep)

julia> @time g1A(1000000, f)
elapsed time: 0.085875439 seconds (30 MB allocated, 5.20% gc time in 1 pauses with 0 full sweep)
elapsed time: 0.074592531 seconds (30 MB allocated, 4.67% gc time in 2 pauses with 0 full sweep)
elapsed time: 0.078681071 seconds (30 MB allocated, 4.75% gc time in 1 pauses with 0 full sweep)

所以有些收获，但并不理想。这是一个深入 Julia 内部工作的已知问题。相关讨论：

Answer 2

这已在 Julia v0.5 中修复。所有这三种情况都应该提供与现在 g3 相同的性能。

当函数传递给函数时，Julia 编译器似乎没有优化

Julia compiler does not appear to optimize when a function is passed a function

julia