Java 中用于 Spark 的 Scala Seq?
Scala Seq for Spark in Java?
我need to use SparkContext instead of JavaSparkContext for the accumulableCollection (if you don't agree check out the linked question请回答!)
澄清问题:SparkContext 在 Java 中可用,但需要 Scala 序列。如何让它快乐 -- 在 Java?
我有这个代码来做一个简单的 jsc.parallelize
我正在使用 JavaSparkContext, but SparkContext wants a Scala collection. I thought here I was building a Scala Range and converting it to a Java list, not sure how to get that core Range to be a Scala Seq, which is what the parallelize from SparkContext is asking for。
// The JavaSparkContext way, was trying to get around MAXINT limit, not the issue here
// setup bogus Lists of size M and N for parallelize
//List<Integer> rangeM = rangeClosed(startM, endM).boxed().collect(Collectors.toList());
//List<Integer> rangeN = rangeClosed(startN, endN).boxed().collect(Collectors.toList());
接下来是金钱线,我如何在 Java 中创建一个 Scala Seq 来并行化?
// these lists above need to be scala objects now that we switched to SparkContext
scala.collection.Seq<Integer> rangeMscala = scala.collection.immutable.List(startM to endM);
// setup sparkConf and create SparkContext
... SparkConf setup
SparkContext jsc = new SparkContext(sparkConf);
RDD<Integer> dataSetMscala = jsc.parallelize(rangeMscala);
你应该这样使用它:
scala.collection.immutable.Range rangeMscala =
scala.collection.immutable.Range$.MODULE$.apply(1, 10);
SparkContext sc = new SparkContext();
RDD dataSetMscala =
sc.parallelize(rangeMscala, 3, scala.reflect.ClassTag$.MODULE$.Object());
希望对您有所帮助!此致
我need to use SparkContext instead of JavaSparkContext for the accumulableCollection (if you don't agree check out the linked question请回答!)
澄清问题:SparkContext 在 Java 中可用,但需要 Scala 序列。如何让它快乐 -- 在 Java?
我有这个代码来做一个简单的 jsc.parallelize
我正在使用 JavaSparkContext, but SparkContext wants a Scala collection. I thought here I was building a Scala Range and converting it to a Java list, not sure how to get that core Range to be a Scala Seq, which is what the parallelize from SparkContext is asking for。
// The JavaSparkContext way, was trying to get around MAXINT limit, not the issue here
// setup bogus Lists of size M and N for parallelize
//List<Integer> rangeM = rangeClosed(startM, endM).boxed().collect(Collectors.toList());
//List<Integer> rangeN = rangeClosed(startN, endN).boxed().collect(Collectors.toList());
接下来是金钱线,我如何在 Java 中创建一个 Scala Seq 来并行化?
// these lists above need to be scala objects now that we switched to SparkContext
scala.collection.Seq<Integer> rangeMscala = scala.collection.immutable.List(startM to endM);
// setup sparkConf and create SparkContext
... SparkConf setup
SparkContext jsc = new SparkContext(sparkConf);
RDD<Integer> dataSetMscala = jsc.parallelize(rangeMscala);
你应该这样使用它:
scala.collection.immutable.Range rangeMscala =
scala.collection.immutable.Range$.MODULE$.apply(1, 10);
SparkContext sc = new SparkContext();
RDD dataSetMscala =
sc.parallelize(rangeMscala, 3, scala.reflect.ClassTag$.MODULE$.Object());
希望对您有所帮助!此致