EMR Spark 在 java 主函数中工作,但不在 java 函数中工作

EMR Spark working in a java main, but not in a java function

我想知道为什么这样做:

public final class JavaSparkPi {

public static void main(String[] args) throws Exception {

    SparkConf sparkConf = new SparkConf().setMaster("yarn-cluster").setAppName("mySparkApp");
    JavaSparkContext jsc = new JavaSparkContext(sparkConf);
        ArrayList<Integer> list = new ArrayList<>();
        for(int i = 0; i < 10 ; i++){
            list.add(i);
        }

        JavaRDD<Integer> dataSet = jsc.parallelize(list)
                .map(s->2*s)
                .map(s->5*s);

        int weirdStuff= dataSet.reduce((a, b) -> (a + b)/2);
        System.out.println("stuff is " + weirdStuff);
        jsc.stop();     

}
}

为什么不这样:

public final class JavaSparkPi {

    private void startWorkingOnMicroSpark() {
    SparkConf sparkConf = new SparkConf().setMaster("yarn-cluster").setAppName("mySparkApp");
    JavaSparkContext jsc = new JavaSparkContext(sparkConf);
        ArrayList<Integer> list = new ArrayList<>();
        for(int i = 0; i < 10 ; i++){
            list.add(i);
        }

        JavaRDD<Integer> dataSet = jsc.parallelize(list)
                .map(s->2*s)
                .map(s->5*s);

        int weirdStuff = dataSet.reduce((a, b) -> (a + b)/2);
        System.out.println("weirdStuff is " + weirdStuff);
        jsc.stop();     
    }
public static void main(String[] args) throws Exception {

    JavaSparkPi jsp = new JavaSparkPi();
    jsp.startWorkingOnMicroSpark();

}  

}

我正在使用 EMR 开发 Spark。我在这两个项目之间发现的唯一区别是,一个将 spark 部分写在 main 中,另一个则没有。 我在 EMR 中将它们作为 spark 应用程序启动 --classJavaSparkPi 参数。

这是失败的状态:

Statut :FAILED

Raison :

Fichier journal :s3://mynewbucket/Logs/j-3AKSZXK7FKMX6/steps/s-2MT0SB910U3TE/stderr.gz

Détails:Exception in thread "main" org.apache.spark.SparkException: Application application_1501228129826_0003 finished with failed status

Emplacement JAR : command-runner.jar

Classe principale : Aucun

Arguments : spark-submit --deploy-mode cluster --class JavaSparkPi s3://mynewbucket/Code/SparkAWS.jar

Action sur échec : Continuer

还有一个成功的:

Emplacement JAR : command-runner.jar
Classe principale : Aucun
Arguments : spark-submit --deploy-mode cluster --class JavaSparkPi 
s3://mynewbucket/Code/SparkAWS.jar
Action sur échec : Continuer

将那些Spark初始化方法放到main中。

SparkConf sparkConf = new SparkConf().setMaster("yarn-cluster").setAppName("mySparkApp"); JavaSparkContext jsc = new JavaSparkContext(sparkConf);