Apache Spark:NoSuchMethodError
Apache Spark: NoSuchMethodError
我正在尝试在 Apache Spark (v1.3.0) 上做一些测试,我有一个简单的 Java 8 class:
public class WordCount {
private JavaSparkContext ctx;
private String inputFile, outputFile;
public WordCount(String inputFile, String outputFile) {
this.inputFile = inputFile;
this.outputFile = outputFile;
// Initialize Spark Conf
ctx = new JavaSparkContext("local", "WordCount",
System.getenv("SPARK_HOME"), System.getenv("JARS"));
}
public static void main(String... args) {
String inputFile = "/home/workspace/spark/src/main/resources/inferno.txt";//args[0];
String outputFile = "/home/workspace/spark/src/main/resources/dv";//args[1];
WordCount wc = new WordCount(inputFile, outputFile);
wc.doWordCount();
wc.close();
}
public void doWordCount() {
long start = System.currentTimeMillis();
JavaRDD<String> inputRdd = ctx.textFile(inputFile);
JavaPairRDD<String, Integer> count = inputRdd.flatMapToPair((String s) -> {
List<Tuple2<String, Integer>> list = new ArrayList<>();
Arrays.asList(s.split(" ")).forEach(s1 -> list.add(new Tuple2<String, Integer>(s1, 1)));
return list;
}).reduceByKey((x, y) -> x + y);
Tuple2<String, Integer> max = count.max(new Tuple2Comparator());
System.out.println(max);
// count.saveAsTextFile(outputFile);
long end = System.currentTimeMillis();
System.out.println(String.format("Time in ms is: %d", end - start));
}
public void close() {
ctx.stop();
}
}
Tuple2Comparator 比较器 class 是:
public class Tuple2Comparator implements Comparator<Tuple2<String, Integer>>, Serializable {
private static final long serialVersionUID = 103955884403243585L;
@Override
public int compare(Tuple2<String, Integer> o1, Tuple2<String, Integer> o2) {
return o2._2() - o1._2();
}
}
当我 运行 它得到以下异常:
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.api.java.JavaPairRDD.max(Ljava/util/Comparator;)Lscala/Tuple2;
at it.conker.spark.base.WordCount.doWordCount2(WordCount.java:69)
at it.conker.spark.base.WordCount.main(WordCount.java:41)
这是 mi pom 文件:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<groupId>it.conker.spark</groupId>
<artifactId>learning-spark-by-example</artifactId>
<modelVersion>4.0.0</modelVersion>
<name>Learning Spark by example</name>
<packaging>jar</packaging>
<version>0.0.1</version>
<dependencies>
<dependency> <!-- Spark dependency -->
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.3.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
</dependencies>
<properties>
<java.version>1.8</java.version>
</properties>
<build>
<pluginManagement>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>${java.version}</source>
<target>${java.version}</target>
</configuration>
</plugin>
</plugins>
</pluginManagement>
</build>
</project>
我 运行 在 eclipse 中 class。
谁能告诉我哪里错了?
我相信这是 SPARK-3266, which will be fixed in all upcoming Spark maintenance releases (see my pull request).
我没有测试过的一种解决方法是在调用 max()
之前将 count
转换为 JavaRDDLike<Tuple2<String, Integer>, ?>
,因为类似的解决方法对其他人有效(请参阅我的 comment on JIRA).
您应该更改 spark 的依赖范围,使其成为编译范围,如果您尝试 运行 在本地计算机上,它将正常工作。
我正在尝试在 Apache Spark (v1.3.0) 上做一些测试,我有一个简单的 Java 8 class:
public class WordCount {
private JavaSparkContext ctx;
private String inputFile, outputFile;
public WordCount(String inputFile, String outputFile) {
this.inputFile = inputFile;
this.outputFile = outputFile;
// Initialize Spark Conf
ctx = new JavaSparkContext("local", "WordCount",
System.getenv("SPARK_HOME"), System.getenv("JARS"));
}
public static void main(String... args) {
String inputFile = "/home/workspace/spark/src/main/resources/inferno.txt";//args[0];
String outputFile = "/home/workspace/spark/src/main/resources/dv";//args[1];
WordCount wc = new WordCount(inputFile, outputFile);
wc.doWordCount();
wc.close();
}
public void doWordCount() {
long start = System.currentTimeMillis();
JavaRDD<String> inputRdd = ctx.textFile(inputFile);
JavaPairRDD<String, Integer> count = inputRdd.flatMapToPair((String s) -> {
List<Tuple2<String, Integer>> list = new ArrayList<>();
Arrays.asList(s.split(" ")).forEach(s1 -> list.add(new Tuple2<String, Integer>(s1, 1)));
return list;
}).reduceByKey((x, y) -> x + y);
Tuple2<String, Integer> max = count.max(new Tuple2Comparator());
System.out.println(max);
// count.saveAsTextFile(outputFile);
long end = System.currentTimeMillis();
System.out.println(String.format("Time in ms is: %d", end - start));
}
public void close() {
ctx.stop();
}
}
Tuple2Comparator 比较器 class 是:
public class Tuple2Comparator implements Comparator<Tuple2<String, Integer>>, Serializable {
private static final long serialVersionUID = 103955884403243585L;
@Override
public int compare(Tuple2<String, Integer> o1, Tuple2<String, Integer> o2) {
return o2._2() - o1._2();
}
}
当我 运行 它得到以下异常:
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.api.java.JavaPairRDD.max(Ljava/util/Comparator;)Lscala/Tuple2;
at it.conker.spark.base.WordCount.doWordCount2(WordCount.java:69)
at it.conker.spark.base.WordCount.main(WordCount.java:41)
这是 mi pom 文件:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<groupId>it.conker.spark</groupId>
<artifactId>learning-spark-by-example</artifactId>
<modelVersion>4.0.0</modelVersion>
<name>Learning Spark by example</name>
<packaging>jar</packaging>
<version>0.0.1</version>
<dependencies>
<dependency> <!-- Spark dependency -->
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.3.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
</dependencies>
<properties>
<java.version>1.8</java.version>
</properties>
<build>
<pluginManagement>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>${java.version}</source>
<target>${java.version}</target>
</configuration>
</plugin>
</plugins>
</pluginManagement>
</build>
</project>
我 运行 在 eclipse 中 class。 谁能告诉我哪里错了?
我相信这是 SPARK-3266, which will be fixed in all upcoming Spark maintenance releases (see my pull request).
我没有测试过的一种解决方法是在调用 max()
之前将 count
转换为 JavaRDDLike<Tuple2<String, Integer>, ?>
,因为类似的解决方法对其他人有效(请参阅我的 comment on JIRA).
您应该更改 spark 的依赖范围,使其成为编译范围,如果您尝试 运行 在本地计算机上,它将正常工作。