方法 运行(JavaRDD<Sequence>) 不适用于参数 (JavaRDD<List<String>>)

The method run(JavaRDD<Sequence>) is not applicable for the arguments (JavaRDD<List<String>>)

尝试在 spark mllib 中执行 Prefixspan 算法时出现错误

The method run(JavaRDD Sequence ) in the type PrefixSpan is not applicable >for the arguments (JavaRDD List String)

我在网站上看到的代码是

JavaRDD<List<List<Integer>>> sequences = sc.parallelize(Arrays.asList(
Arrays.asList(Arrays.asList(1, 2), Arrays.asList(3)),
Arrays.asList(Arrays.asList(1), Arrays.asList(3, 2), Arrays.asList(1, 2)),
Arrays.asList(Arrays.asList(1, 2), Arrays.asList(5)),
Arrays.asList(Arrays.asList(6))), 2);
PrefixSpan prefixSpan = new PrefixSpan().setMinSupport(0.5).setMaxPatternLength(5);
PrefixSpanModel<Integer> model = prefixSpan.run(sequences);
for (PrefixSpan.FreqSequence<Integer> freqSeq: model.freqSequences().toJavaRDD().collect()) {
     System.out.println(freqSeq.javaSequence() + ", " + freqSeq.freq());
}

我的密码是

List<List<String>> sequences = createLists(featuresForAlgo);

JavaRDD<List<String>> rdd =  sc.parallelize(sequences);

PrefixSpan prefixSpan = new PrefixSpan()
          .setMinSupport(0.5)
          .setMaxPatternLength(5);
        PrefixSpanModel<String> model = prefixSpan.run(rdd);
        for (PrefixSpan.FreqSequence<Integer> freqSeq: model.freqSequences().toJavaRDD().collect()) {
          System.out.println(freqSeq.javaSequence() + ", " + freqSeq.freq());
        }

其中方法prefixSpan.run(rdd) 给出了错误。 知道为什么我会收到此错误吗? 据我所知,列表是一个序列。

谢谢

该错误有点误导,但如果您查看 PrefixSpan class 的源代码,您会发现 运行 方法参数类似于

@param data ordered sequences of itemsets stored as Java Iterable of Iterables

所以 prefixSpan.run 方法需要 JavaRDD<List<List<String>>>。在您的代码中,您可以这样做

List<List<List<String>>> seqNew = new ArrayList<List<List<String>>>();
seqNew.add(sequences);
JavaRDD<List<List<String>>> rdd =  sc.parallelize(seqNew);