如何将 FP-Growth 模型 FrequentItemSet 结果保存到文本文件中?
How to save FP-Growth model FrequentItemSet results in to a text file?
我正在尝试将从模型生成的频繁项集保存到文本文件中。该代码是 Spark ML 库中 FPGrowth 示例的示例。
直接在模型上使用 saveAsTextFile 写入 RDD 位置而不是实际值。
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.SparkConf;
import org.apache.spark.mllib.fpm.FPGrowth;
import org.apache.spark.mllib.fpm.FPGrowthModel;
import org.apache.spark.api.java.function.Function;
import java.util.Arrays;
import java.util.List;
public class Test_ItemFrequency {
public static void main(String args[]) {
SparkConf conf = new SparkConf().setAppName("FP-Growth_ItemFrequency").setMaster("local");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD<String> data = sc.textFile("/data/mllib/sample_fpgrowth.txt");
JavaRDD<List<String>> transactions = data.map(new Function<String, List<String>>() {
public List<String> call(String line) {
String[] parts = line.split(" ");
return Arrays.asList(parts);
}
});
FPGrowth fpg = new FPGrowth().setMinSupport(0.2).setNumPartitions(1);
FPGrowthModel<String> model = fpg.run(transactions);
model.freqItemsets().saveAsTextFile("/home/data/itemset");
sc.stop();
}
}
在文本文件中生成的输出类似于
org.apache.spark.mllib.fpm.FPGrowth$FreqItemset@754881de
org.apache.spark.mllib.fpm.FPGrowth$FreqItemset@73022909
org.apache.spark.mllib.fpm.FPGrowth$FreqItemset@25df2591
org.apache.spark.mllib.fpm.FPGrowth$FreqItemset@774b6aca
org.apache.spark.mllib.fpm.FPGrowth$FreqItemset@100ba1db
org.apache.spark.mllib.fpm.FPGrowth$FreqItemset@72a388b2
org.apache.spark.mllib.fpm.FPGrowth$FreqItemset@2e8cc8da
谁能解释一下如何解决?提前致谢。
使用 lambda 表达式 :
model.freqItemsets()
.toJavaRDD()
.map((Function<FPGrowth.FreqItemset<String>, String>) fi -> fi.javaItems() + " -> " + fi.freq())
.saveAsTextFile("/home/data/itemset");
我们将 FPGrowth.FreqItemSet
转换为 JavaRDD<String>
,以便之后保存。
解决方案没有 lambda 表达式:
model.freqItemsets()
.toJavaRDD()
.map(new Function<FPGrowth.FreqItemset<String>, String>() {
@Override
public String call(FPGrowth.FreqItemset<String> fi) {
return fi.javaItems() + " -> " + fi.freq();
}
}
).saveAsTextFile("/home/data/itemset");
我正在尝试将从模型生成的频繁项集保存到文本文件中。该代码是 Spark ML 库中 FPGrowth 示例的示例。 直接在模型上使用 saveAsTextFile 写入 RDD 位置而不是实际值。
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.SparkConf;
import org.apache.spark.mllib.fpm.FPGrowth;
import org.apache.spark.mllib.fpm.FPGrowthModel;
import org.apache.spark.api.java.function.Function;
import java.util.Arrays;
import java.util.List;
public class Test_ItemFrequency {
public static void main(String args[]) {
SparkConf conf = new SparkConf().setAppName("FP-Growth_ItemFrequency").setMaster("local");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD<String> data = sc.textFile("/data/mllib/sample_fpgrowth.txt");
JavaRDD<List<String>> transactions = data.map(new Function<String, List<String>>() {
public List<String> call(String line) {
String[] parts = line.split(" ");
return Arrays.asList(parts);
}
});
FPGrowth fpg = new FPGrowth().setMinSupport(0.2).setNumPartitions(1);
FPGrowthModel<String> model = fpg.run(transactions);
model.freqItemsets().saveAsTextFile("/home/data/itemset");
sc.stop();
}
}
在文本文件中生成的输出类似于
org.apache.spark.mllib.fpm.FPGrowth$FreqItemset@754881de
org.apache.spark.mllib.fpm.FPGrowth$FreqItemset@73022909
org.apache.spark.mllib.fpm.FPGrowth$FreqItemset@25df2591
org.apache.spark.mllib.fpm.FPGrowth$FreqItemset@774b6aca
org.apache.spark.mllib.fpm.FPGrowth$FreqItemset@100ba1db
org.apache.spark.mllib.fpm.FPGrowth$FreqItemset@72a388b2
org.apache.spark.mllib.fpm.FPGrowth$FreqItemset@2e8cc8da
谁能解释一下如何解决?提前致谢。
使用 lambda 表达式 :
model.freqItemsets()
.toJavaRDD()
.map((Function<FPGrowth.FreqItemset<String>, String>) fi -> fi.javaItems() + " -> " + fi.freq())
.saveAsTextFile("/home/data/itemset");
我们将 FPGrowth.FreqItemSet
转换为 JavaRDD<String>
,以便之后保存。
解决方案没有 lambda 表达式:
model.freqItemsets()
.toJavaRDD()
.map(new Function<FPGrowth.FreqItemset<String>, String>() {
@Override
public String call(FPGrowth.FreqItemset<String> fi) {
return fi.javaItems() + " -> " + fi.freq();
}
}
).saveAsTextFile("/home/data/itemset");