Spark Streaming 开发人员必须在他的计算机上安装 Hadoop 吗?
Must the Spark Streaming developer install Hadoop on his computer?
我正在学习spark streaming,当我的demo set Master是“local[2]”时,是正常的。但是当我为以StandAlone模式启动的本地集群设置Master时,出现了一个错误:
丢失了执行器 2(已删除):由于 java.io.FileNotFoundException 无法创建执行器:java.io.FileNotFoundException:HADOOP_HOME 和 hadoop.home.dir 未设置。
需要注意的是我在idea中提交了代码
@Component
public final class JavaNetworkWordCount {
private static final String SPACE = " ";
@Bean("test")
public void test() throws Exception {
// Create a local StreamingContext with two working thread and batch interval of 10 second
SparkConf conf = new SparkConf()
.setJars(new String[]{"E:\project\spark-demo\target\spark-demo-0.0.1-SNAPSHOT.jar"})
.setMaster("spark://10.4.41.93:7077")
.set("spark.driver.host", "127.0.0.1")
.setAppName("JavaWordCount");
JavaStreamingContext jssc = new JavaStreamingContext(conf, Durations.seconds(1));
// Create a DStream that will connect to hostname:port, like localhost:9999
JavaReceiverInputDStream<String> lines = jssc.socketTextStream("192.168.2.51", 9999);
// Split each line into words
JavaDStream<String> words = lines.flatMap(x -> Arrays.asList(x.split(SPACE)).iterator());
// Count each word in each batch
JavaPairDStream<String, Integer> pairs = words.mapToPair(s -> new Tuple2<>(s, 1));
JavaPairDStream<String, Integer> wordCounts = pairs.reduceByKey((i1, i2) -> i1 + i2);
// Print the first ten elements of each RDD generated in this DStream to the console
wordCounts.print();
jssc.start(); // Start the computation
jssc.awaitTermination(); // Wait for the computation to terminate
}
}
原来是这样,不过我下载了hadoop,把值设为HADOOP_HOME,重启集群后,这个错误就消失了
我正在学习spark streaming,当我的demo set Master是“local[2]”时,是正常的。但是当我为以StandAlone模式启动的本地集群设置Master时,出现了一个错误: 丢失了执行器 2(已删除):由于 java.io.FileNotFoundException 无法创建执行器:java.io.FileNotFoundException:HADOOP_HOME 和 hadoop.home.dir 未设置。
需要注意的是我在idea中提交了代码
@Component
public final class JavaNetworkWordCount {
private static final String SPACE = " ";
@Bean("test")
public void test() throws Exception {
// Create a local StreamingContext with two working thread and batch interval of 10 second
SparkConf conf = new SparkConf()
.setJars(new String[]{"E:\project\spark-demo\target\spark-demo-0.0.1-SNAPSHOT.jar"})
.setMaster("spark://10.4.41.93:7077")
.set("spark.driver.host", "127.0.0.1")
.setAppName("JavaWordCount");
JavaStreamingContext jssc = new JavaStreamingContext(conf, Durations.seconds(1));
// Create a DStream that will connect to hostname:port, like localhost:9999
JavaReceiverInputDStream<String> lines = jssc.socketTextStream("192.168.2.51", 9999);
// Split each line into words
JavaDStream<String> words = lines.flatMap(x -> Arrays.asList(x.split(SPACE)).iterator());
// Count each word in each batch
JavaPairDStream<String, Integer> pairs = words.mapToPair(s -> new Tuple2<>(s, 1));
JavaPairDStream<String, Integer> wordCounts = pairs.reduceByKey((i1, i2) -> i1 + i2);
// Print the first ten elements of each RDD generated in this DStream to the console
wordCounts.print();
jssc.start(); // Start the computation
jssc.awaitTermination(); // Wait for the computation to terminate
}
}
原来是这样,不过我下载了hadoop,把值设为HADOOP_HOME,重启集群后,这个错误就消失了