在 julia 中结合 repmat 和转置

Question

我正在将一些代码从 R 移植到 julia 以熟悉这门语言，但我发现了一些翻译不流畅的模式。考虑以下函数，

# Ricatti-Bessel and derivatives up to nmax, vectorised over x
function rb(x, nmax)

  n = 1:nmax
  nu = 0.5 + [0, n]

  bj = hcat([besselj(nu, _x) for _x in x]...).'
  # ^ first question ^ 
  sq = repmat(sqrt(pi/2*x), 1, nmax+1)

  bj .*= sq
  xm = repmat(x, 1, nmax)
  nm = repmat(n', length(x), 1)
  # ^ second question ^ 

  dpsi = bj[:,n] - nm .* bj[:,n+1] ./ xm 
  psi = bj[:,n+1]
  return psi, dpsi # it'd be nice to return a "named list" instead

end
# rb(1:5,3)

第一个问题：这是使用 besselj() 获得具有 nmax 列和 length(x) 行的矩阵的最佳方法吗？在找到有效的模式之前，我不得不摸索了很长时间。
第二个问题：我发现自己经常需要转置对象，在 and/or out of repmat 中，是否有替代方法可以指定输出大小和填充方向（按行或 col-wise)?

也许我对整个事情采取了错误的方法：我习惯于使用矢量化函数（在 R 中，以及 Matlab 的旧记忆），因为它们通常是快速例程的最短路线线性代数。始终保持 x 为标量并仅在最高级别循环是否更有意义？我担心这样做，我将无法使用来自 BLAS 等的快速 matrix/vector 函数，并且基本上无法在 julia 中重写它们，更不用说明显的可读性损失了。我应该强调，我对最佳性能很感兴趣，因为对于许多 x 值，此函数将在内部被调用很多次。

Answer 1

对于你的第一个问题，我将用以下矩阵理解替换它：

nu = (0:nmax)+0.5
bj = [besselj(i,j) for j in x, i in nu]

对于你的第二个，我认为在 Julia 中编写高性能代码的一个好原则是避免不必要的分配（当然还有阅读 performance tips！）Julia 尽可能生成非常快的指令 - 这是为什么 for 循环非常好，除了线性代数（例如矩阵乘法）之外，向量化任何东西都不是很重要。它做得不好的是避免分配不必要的内存（像 sq 这样的临时矩阵）。我将 bj/sq 行替换为以下

nu = (0:nmax)+0.5
bj = [besselj(i,j)*sqrt(pi/2*j) for j in x, i in nu]

这很好，因为它只有一个分配，并且在我们之前（假设我们从我对 Q1 的回答开始）：

分配bj
分配sq
分配 bj.*sq 并重新绑定 bj 到这个新内存

（注意 .*= 是不是就地操作！）

你对 "named list" 的请求现在可能最好通过为这个函数的 return 做一个 type 来满足（这根本不是一个昂贵的操作，并且在例如 Julia 的矩阵分解代码中非常常见，其中需要对多个值进行 returned）。或者，您可以 return a Dict，但这并不符合习惯。

对于 dpsi 行，我会给你两个选项。首先是另一个矩阵推导：

dpsi = [ bj[i,j] - j * bj[i,j+1] / i 
           for i in 1:length(x), j in 1:nmax]

另一个是for loop-y:

dpsi = zeros(length(x),nmax)
for i in 1:length(x), j in 1:nmax
  dpsi[i,j] = bj[i,j] - j * bj[i,j+1] / i
end

在这两种情况下，我都避免分配临时对象。同样，您的原件具有以下分配：

对于xm
对于nm
对于bj[:,n]（这将在 0.4 中变为视图）
对于bj[:,n+1]（同上）
对于nm .* bj[:,n+1]
对于nm .* bj[:,n+1] ./ xm
对于整个结果

而且我提出的两个版本都只有一个分配，并且可能更接近问题的原始数学陈述

我的最终版本是

function myrb(x, nmax)
  bj   = [ besselj(i,j)*sqrt(pi/2*j)
              for j in x, i in (0:nmax)+0.5]
  dpsi = [ bj[i,j] - j * bj[i,j+1] / i 
              for i in 1:length(x), j in 1:nmax]
  psi  = bj[:,2:nmax+1]
  return psi, dpsi
end

我对 besselj 了解不多，但我猜它是整个过程中最慢的部分，所以就速度而言，这一切可能并不重要这个特殊情况。对这个微型案例进行基准测试表明：

# original
elapsed time: 9.7578e-5 seconds (7176 bytes allocated)
elapsed time: 7.2644e-5 seconds (7176 bytes allocated)
elapsed time: 7.5709e-5 seconds (7176 bytes allocated)
# revised
elapsed time: 2.7536e-5 seconds (728 bytes allocated)
elapsed time: 2.7097e-5 seconds (728 bytes allocated)
elapsed time: 1.6601e-5 seconds (728 bytes allocated)

您可以通过探查器确认这一点（尽管我不得不使用更大的输入:)

@profile myrb(1:500,300)
Profile.print()

在我的机器上，该函数收集了 429 个样本，其中 426 个在 Julia 的 bessel.jl 文件中，2 个用于 dpsi，1 个用于 psi .

在 julia 中结合 repmat 和转置

combining repmat and transpose in julia

vectorization

julia