Scala 中具有不可变集合的素数的试验划分

Question

我正在尝试通过重写基本练习来学习 Scala 和函数式编程思想。目前我在生成素数的天真方法上遇到了麻烦"trial division"。

下面描述的问题是我无法以保持效率的函数式风格重写著名的算法，因为我没有合适的不可变数据结构，比如List，但不仅在头上操作速度快，也是在最后。

我开始编写 java 代码，该代码针对每个奇数测试其是否被已找到的素数整除（受测试值的平方根限制）- 如果没有除数，则将其添加到列表末尾被发现。

    List<Integer> primes = new ArrayList<>();
    primes.add(2);
    int cur = 3;
    while (primes.size() < 100000) {
        for (Integer x : primes) {
            if (x * x > cur) {
                primes.add(cur);
                break;
            }
            if (cur % x == 0) {
                break;
            }
        }
        cur += 2;
    }

现在我尝试在 "functional way" 中重写它 - 使用递归而不是循环没有问题，但我坚持使用不可变集合。核心思想如下：

http://ideone.com/4DQ6mi

def primes(n: Int) = {
    @tailrec
    def divisibleByAny(x: Int, list: List[Int]): Boolean = {
        if (list.isEmpty) false else {
            val h = list.head
            h * h <= x && (x % h == 0 || divisibleByAny(x, list.tail))
        }
    }
    @tailrec
    def morePrimes(from: Int, prev: List[Int]): List[Int] = {
        if (prev.size == n) prev else
            morePrimes(from + 2, if (divisibleByAny(from, prev)) prev else prev :+ from)
    }
    morePrimes(3, List(2))
}

但是慢 - 如果我理解正确的话，因为添加到不可变列表末尾的操作需要创建新的副本全部。

我搜索了文档以找到更合适的数据结构，并尝试用不可变队列替换列表，因为据说：

Adding items to the queue always has cost O(1) ... Removing an item is on average O(1).

但还是更慢:

http://ideone.com/v8BsuQ

def primes(n: Int) = {
    @tailrec
    def divisibleByAny(x: Int, list: Queue[Int]): Boolean = {
        if (list.isEmpty) false else {
            val (h, t) = list.dequeue
            h * h <= x && (x % h == 0 || divisibleByAny(x, t))
        }
    }
    @tailrec
    def morePrimes(from: Int, prev: Queue[Int]): Queue[Int] = {
        if (prev.size == n) prev else
            morePrimes(from + 2, if (divisibleByAny(from, prev)) prev else prev.enqueue(from))
    }
    morePrimes(3, Queue(2))
}

出了什么问题还是我遗漏了什么？

P.S。我相信还有其他算法可以生成更适合函数式风格的素数。我想我看过一些纸。但现在我对这个感兴趣，或者更准确地说，是对合适的数据结构的存在感兴趣。

Answer 1

根据 http://docs.scala-lang.org/overviews/collections/performance-characteristics.html Vectors 有一个用于附加、前置和查找的摊销常数成本。事实上，在您的解决方案中使用向量而不是列表要快得多

def primes(n: Int) = {
  @tailrec
  def divisibleByAny(x: Int, list: Vector[Int]): Boolean = {
    if (list.isEmpty) false else {
      val (h +: t) = list
      h * h <= x && (x % h == 0 || divisibleByAny(x, t))
    }
  }
  @tailrec
  def morePrimes(from: Int, prev: Vector[Int]): Vector[Int] = {
    if (prev.length == n) prev else
      morePrimes(from + 2, if (divisibleByAny(from, prev)) prev else prev :+ from)
  }
  morePrimes(3, Vector(2))
}

http://ideone.com/x3k4A3

Answer 2

我认为你有两个主要选择

使用向量 - 这比用于附加的列表更好。它是一个 Bitmapped Trie 数据结构 (http://en.wikipedia.org/wiki/Trie)。附加到（即平均 O(1)）

或者...可能是您不想要的答案

使用像 ListBuffer 这样的可变数据结构 - 不可变性很适合尝试实现，并且应该是你的集合 - 但有时出于效率原因，你可能会使用可变结构。关键是要确保它不会“泄漏”您的 classes。如果你查看 List.scala 实现，你会发现 ListBuffer 在内部被大量使用。但是，它在离开 class 之前变回了一个列表。如果它对于核心 Scala 库来说足够好，那么在有保证的特殊情况下使用它可能没问题。

Answer 3

除了使用Vector，还可以考虑使用高阶函数而不是递归。这也是一种完全有效的函数式风格。在我的机器上，divisibleByAny 的以下实现比运行 primes(1000000):

时的@Pyetras tailrec 实现快大约 8 倍

def divisibleByAny(x: Int, list: Vector[Int]): Boolean =
  list.view.takeWhile(el => el * el <= x).exists(x % _ == 0)

Scala 中具有不可变集合的素数的试验划分

Trial division for primes with immutable collections in Scala

algorithm

functional-programming

scala

data-structures