Spark 1.6 - 关联规则算法 - 无法应用于 (org.apache.spark.rdd.RDD[Array[String]])

Spark 1.6 - Association Rules algorithm - Cannot be applied to (org.apache.spark.rdd.RDD[Array[String]])

我有这段代码可以找到一些关联规则:

import org.apache.spark.mllib.fpm.AssociationRules
import org.apache.spark.mllib.fpm.FPGrowth.FreqItemset

val data = sc.textFile("FILE");

val transactions: RDD[Array[String]] = data.map(s => s.trim.split(','));

val ar = new AssociationRules()
  .setMinConfidence(0.8)
val results = ar.run(transactions)

results.collect().foreach { rule =>
  println("[" + rule.antecedent.mkString(",")
    + "=>"
    + rule.consequent.mkString(",") + "]," + rule.confidence)
}

但是我收到这个错误:

<console>:50: error: overloaded method value run with alternatives:
  [Item](freqItemsets: org.apache.spark.api.java.JavaRDD[org.apache.spark.mllib.fpm.FPGrowth.FreqItemset[Item]])org.apache.spark.api.java.JavaRDD[org.apache.spark.mllib.fpm.AssociationRules.Rule[Item]] <and>
  [Item](freqItemsets: org.apache.spark.rdd.RDD[org.apache.spark.mllib.fpm.FPGrowth.FreqItemset[Item]])(implicit evidence: scala.reflect.ClassTag[Item])org.apache.spark.rdd.RDD[org.apache.spark.mllib.fpm.AssociationRules.Rule[Item]]
 cannot be applied to (org.apache.spark.rdd.RDD[Array[String]])
         val results = ar.run(transactions)

如何将这个rdd转换为关联规则需要的类型?

非常感谢!

您首先必须创建一个 FPGrowthModel,然后像下面这样传递 freqItemsets:

import org.apache.spark.mllib.fpm.AssociationRules
import org.apache.spark.mllib.fpm.FPGrowth.FreqItemset
import org.apache.spark.mllib.fpm.FPGrowth

val data = sc.textFile("FILE");

val transactions: RDD[Array[String]] = data.map(s => s.trim.split(','));

val fpg = new FPGrowth()
  .setMinSupport(0.2)
  .setNumPartitions(10)

val model = fpg.run(transactions) // creates the FPGrowthModel

val ar = new AssociationRules()
  .setMinConfidence(0.8)

val results = ar.run(model.freqItemsets)