删除可变总和变量

Remove mutable sum var

这是基于 Jeff Atwood 的回答的熵计算:How to calculate the entropy of a file? which is based on http://en.wikipedia.org/wiki/Entropy_(information_theory)

object MeasureEntropy extends App {

  val s = "measure measure here measure measure measure"

  def entropyValue(s: String) = {

    val m = s.split(" ").toList.groupBy((word: String) => word).mapValues(_.length.toDouble)
    var result: Double = 0.0;
    val len = s.split(" ").length;

    m map {
      case (key, value: Double) =>
        {
          var frequency: Double = value / len;
          result -= frequency * (scala.math.log(frequency) / scala.math.log(2));
        }
    }

    result;
  }

  println(entropyValue(s))
}

我想通过删除与以下相关的可变状态来改进这一点:

var result: Double = 0.0;

如何将 result 合并到 map 函数的单个计算中?

使用 foldLeft,或者在这种情况下 /: 这是它的语法糖:

(0d /: m) {case (result, (key,value)) => 
  val frequency = value / len
  result - frequency * (scala.math.log(frequency) / scala.math.log(2))
}

文档:http://www.scala-lang.org/files/archive/api/current/index.html#scala.collection.immutable.Map@/:B(op:(B,A)=>B):B

一个简单的 sum 就可以解决问题:

m.map {
  case (key, value: Double) =>
     val frequency: Double = value / len;
      - frequency * (scala.math.log(frequency) / scala.math.log(2));
}.sum

可以像下面这样用foldLeft来写

  def entropyValue(s: String) = {
    val m = s.split(" ").toList.groupBy((word: String) => word).mapValues(_.length.toDouble)
    val len = s.split(" ").length
    m.foldLeft(0.0)((r, t) => r - ((t._2 / len) * (scala.math.log(t._2 / len) / scala.math.log(2))))
  }