为什么并行 pi 估计比顺序 pi 估计慢?

Why is parallel pi estimation slower than sequential pi estimation?

所以我尝试根据蒙特卡洛方法提出不同的 pi 估计实现。有3个实现-

  1. 顺序 - 更快的性能结果
  2. Scala 并行 collections - 最慢的结果
  3. Scala futures - 比并行 collections 快,但比顺序

我在 AWS 上使用新的 m4xlarge 实例进行了这些基准测试,使用 Scalameter、运行 它们在 sbt shell.

中完成

这是顺序赛跑者 -

import java.lang.Math.random

object Runner extends App {

  val numberOfPoints = if (args.length > 0) {
    args(0).toInt
  } else {
    3 // Some default
  }

  import org.scalameter._

  def isWithinBounds(pair: (Double, Double)) = pair._1 * pair._1 + pair._2 * pair._2 < 1

  def piPredictor(numberOfPoints: Int) = {
    (1 to numberOfPoints).map(_ => (random(), random())).count(isWithinBounds) * 4.0 / numberOfPoints
  }

  def runExperiment(numberOfPoints: Int) = withWarmer(new Warmer.Default) measure {
    piPredictor(numberOfPoints)
  }

  def reporter(t: (Quantity[Double], Int)): Unit = println(s"It took ${t._1} for ${t._2} points")

  def raise10To(exponent: Int) = Math.pow(10, exponent).toInt

  (1 to numberOfPoints).map(raise10To).map(numberOfPoints => (runExperiment(numberOfPoints), numberOfPoints)).foreach(reporter)

}

这是并行 collections 转轮 - 请注意,唯一真正的区别是使用 par 方法。

import java.lang.Math.random
import scala.collection.parallel.CollectionConverters._

object Runner extends App {

  val numberOfPoints = if (args.length > 0) {
    args(0).toInt
  } else {
    3 // Some default
  }

  import org.scalameter._

  def isWithinBounds(pair: (Double, Double)) = pair._1 * pair._1 + pair._2 * pair._2 < 1

  def piPredictor(numberOfPoints: Int) = {
    (1 to numberOfPoints).par.map(_ => (random(), random())).count(isWithinBounds) * 4.0 / numberOfPoints
  }

  def runExperiment(numberOfPoints: Int) = withWarmer(new Warmer.Default) measure {
    piPredictor(numberOfPoints)
  }

  def reporter(t: (Quantity[Double], Int)): Unit = println(s"It took ${t._1} for ${t._2} points")

  def raise10To(exponent: Int) = Math.pow(10, exponent).toInt

  (1 to numberOfPoints).map(raise10To).map(numberOfPoints => (runExperiment(numberOfPoints), numberOfPoints)).foreach(reporter)

}

最后这是 Futures

的亚军
import java.lang.Math.random

import scala.concurrent.duration.Duration
import scala.concurrent.{Await, ExecutionContext, Future}

object Runner extends App {

  implicit val executionContext: ExecutionContext = scala.concurrent.ExecutionContext.Implicits.global

  val numberOfPoints = if (args.length > 0) {
    args(0).toInt
  } else {
    3 // Some default
  }

  val numberOfThreads = if (args.length > 1) {
    args(1).toInt
  } else {
    2 // 2 threads as default
  }

  import org.scalameter._

  def isWithinBounds(pair: (Double, Double)) = pair._1 * pair._1 + pair._2 * pair._2 < 1

  def piPredictor(numberOfPoints: Int) = {
    def compute(n: Int) = (1 to n).map(_ => (random(), random())).count(isWithinBounds)
    val partsF: Seq[Future[Int]] = (1 to numberOfThreads).map(_ => Future(compute(numberOfPoints / numberOfThreads)))
    partsF.map(x => Await.result(x, Duration.Inf)).sum * 4.0 / numberOfPoints
  }

  def runExperiment(numberOfPoints: Int) = withWarmer(new Warmer.Default) measure {
    piPredictor(numberOfPoints)
  }

  def reporter(t: (Quantity[Double], Int)): Unit = println(s"It took ${t._1} for ${t._2} points")

  def raise10To(exponent: Int) = Math.pow(10, exponent).toInt

  (1 to numberOfPoints).map(raise10To).map(numberOfPoints => (runExperiment(numberOfPoints), numberOfPoints)).foreach(reporter)

}

这是每个基准测试所用的时间 -

Sequential run reports on m4xlarge
sbt:scala-parallel-programs> run 7
[info] running montecarlo.sequential.Runner 7
It took 0.050859 ms for 10 points
It took 0.057532 ms for 100 points
It took 0.198189 ms for 1000 points
It took 1.391727 ms for 10000 points
It took 10.373825 ms for 100000 points
It took 75.328267 ms for 1000000 points
It took 1162.63124 ms for 10000000 points

并行 Collections -

Parallel collections report - Very slow compared to sequential collecctions
info] running montecarlo.parallelcollections.Runner 7
It took 1.677748 ms for 10 points
It took 1.061964 ms for 100 points
It took 0.562431 ms for 1000 points
It took 3.582251 ms for 10000 points
It took 30.981812 ms for 100000 points
It took 332.464702 ms for 1000000 points
It took 3252.623377 ms for 10000000 points

最后是未来亚军 -

Future implementation running times on m4xlarge

1 thread - on average, slower than sequential...
[info] running montecarlo.fixedfutures.Runner 7 1
It took 0.245685 ms for 10 points
It took 0.260486 ms for 100 points
It took 0.366152 ms for 1000 points
It took 0.799212 ms for 10000 points
It took 6.76789 ms for 100000 points
It took 94.150547 ms for 1000000 points
It took 1090.711087 ms for 10000000 points

2 threads - wayyy slower than sequential... I don't know, it makes no sense...
[info] running montecarlo.fixedfutures.Runner 7 2
It took 0.226309 ms for 10 points
It took 0.192723 ms for 100 points
It took 0.241403 ms for 1000 points
It took 2.342587 ms for 10000 points
It took 22.278208 ms for 100000 points
It took 229.475656 ms for 1000000 points
It took 2400.402471 ms for 10000000 points

10 threads - whatt.... this is weird...
[info] running montecarlo.fixedfutures.Runner 7 10
It took 0.193699 ms for 10 points
It took 0.31988 ms for 100 points
It took 0.62585 ms for 1000 points
It took 3.040552 ms for 10000 points
It took 24.241802 ms for 100000 points
It took 310.822098 ms for 1000000 points / runMain 38s
It took 3088.061321 ms for 10000000 points

我期待性能提升,因为这是一个非常并行的问题。并且主要期望在 count

等操作中获得良好的性能提升

java.math.random()同步的 (如 docs 所述),这意味着您的 map 主要是 顺序 .

您可能想改用 java.util.concurrent.ThreadLocalRandom.current()