Spark 1.6 - 关联规则算法 - 无法应用于 (org.apache.spark.rdd.RDD[Array[String]])
Spark 1.6 - Association Rules algorithm - Cannot be applied to (org.apache.spark.rdd.RDD[Array[String]])
我有这段代码可以找到一些关联规则:
import org.apache.spark.mllib.fpm.AssociationRules
import org.apache.spark.mllib.fpm.FPGrowth.FreqItemset
val data = sc.textFile("FILE");
val transactions: RDD[Array[String]] = data.map(s => s.trim.split(','));
val ar = new AssociationRules()
.setMinConfidence(0.8)
val results = ar.run(transactions)
results.collect().foreach { rule =>
println("[" + rule.antecedent.mkString(",")
+ "=>"
+ rule.consequent.mkString(",") + "]," + rule.confidence)
}
但是我收到这个错误:
<console>:50: error: overloaded method value run with alternatives:
[Item](freqItemsets: org.apache.spark.api.java.JavaRDD[org.apache.spark.mllib.fpm.FPGrowth.FreqItemset[Item]])org.apache.spark.api.java.JavaRDD[org.apache.spark.mllib.fpm.AssociationRules.Rule[Item]] <and>
[Item](freqItemsets: org.apache.spark.rdd.RDD[org.apache.spark.mllib.fpm.FPGrowth.FreqItemset[Item]])(implicit evidence: scala.reflect.ClassTag[Item])org.apache.spark.rdd.RDD[org.apache.spark.mllib.fpm.AssociationRules.Rule[Item]]
cannot be applied to (org.apache.spark.rdd.RDD[Array[String]])
val results = ar.run(transactions)
如何将这个rdd转换为关联规则需要的类型?
非常感谢!
您首先必须创建一个 FPGrowthModel,然后像下面这样传递 freqItemsets:
import org.apache.spark.mllib.fpm.AssociationRules
import org.apache.spark.mllib.fpm.FPGrowth.FreqItemset
import org.apache.spark.mllib.fpm.FPGrowth
val data = sc.textFile("FILE");
val transactions: RDD[Array[String]] = data.map(s => s.trim.split(','));
val fpg = new FPGrowth()
.setMinSupport(0.2)
.setNumPartitions(10)
val model = fpg.run(transactions) // creates the FPGrowthModel
val ar = new AssociationRules()
.setMinConfidence(0.8)
val results = ar.run(model.freqItemsets)
我有这段代码可以找到一些关联规则:
import org.apache.spark.mllib.fpm.AssociationRules
import org.apache.spark.mllib.fpm.FPGrowth.FreqItemset
val data = sc.textFile("FILE");
val transactions: RDD[Array[String]] = data.map(s => s.trim.split(','));
val ar = new AssociationRules()
.setMinConfidence(0.8)
val results = ar.run(transactions)
results.collect().foreach { rule =>
println("[" + rule.antecedent.mkString(",")
+ "=>"
+ rule.consequent.mkString(",") + "]," + rule.confidence)
}
但是我收到这个错误:
<console>:50: error: overloaded method value run with alternatives:
[Item](freqItemsets: org.apache.spark.api.java.JavaRDD[org.apache.spark.mllib.fpm.FPGrowth.FreqItemset[Item]])org.apache.spark.api.java.JavaRDD[org.apache.spark.mllib.fpm.AssociationRules.Rule[Item]] <and>
[Item](freqItemsets: org.apache.spark.rdd.RDD[org.apache.spark.mllib.fpm.FPGrowth.FreqItemset[Item]])(implicit evidence: scala.reflect.ClassTag[Item])org.apache.spark.rdd.RDD[org.apache.spark.mllib.fpm.AssociationRules.Rule[Item]]
cannot be applied to (org.apache.spark.rdd.RDD[Array[String]])
val results = ar.run(transactions)
如何将这个rdd转换为关联规则需要的类型?
非常感谢!
您首先必须创建一个 FPGrowthModel,然后像下面这样传递 freqItemsets:
import org.apache.spark.mllib.fpm.AssociationRules
import org.apache.spark.mllib.fpm.FPGrowth.FreqItemset
import org.apache.spark.mllib.fpm.FPGrowth
val data = sc.textFile("FILE");
val transactions: RDD[Array[String]] = data.map(s => s.trim.split(','));
val fpg = new FPGrowth()
.setMinSupport(0.2)
.setNumPartitions(10)
val model = fpg.run(transactions) // creates the FPGrowthModel
val ar = new AssociationRules()
.setMinConfidence(0.8)
val results = ar.run(model.freqItemsets)