Nd4j：使用多线程比单线程慢

Question

这是我的处理器： 2.3 GHz Intel Core i7（因此具有超线程的 4 核）在 MacOs Sierra

这是我的程序：

package nn

import org.nd4j.linalg.api.ndarray.INDArray
import org.nd4j.linalg.factory.Nd4j.randn
import org.nd4j.linalg.ops.transforms.Transforms._
import org.nd4s.Implicits._

object PerfTest extends App {

  val topology = List(784, 30, 10)
  val biases: List[INDArray] =
    topology.tail.map(size => randn(size, 1))

  val weights: List[INDArray] =
    topology.sliding(2).map(t => randn(t(1), t.head)) toList

  (1 to 100000).foreach { i =>
    val x = randn(784, 1)
    biases.zip(weights).foldLeft(List(x)) {
      case (as, (b, w)) =>
        val z = (w dot as.last) + b
        val a = sigmoid(z)
        as :+ a
    }
  }
}

当我运行上面的程序使用默认线程（对于 nd4j 和这个处理器这将是 4）时，它需要大约 28 秒。

当我运行它在 1 个核心 (export OMP_NUM_THREADS=1) 上时，则需要 18 秒。

知道这是为什么吗？谢谢。

Answer 1

我找不到对此的说明。所以我迁移到 Breeze，它快了 6 倍，毫不费力。

Nd4j：使用多线程比单线程慢

Nd4j: using multiple threads is slower than single-thread

scala

nd4j