Spark:从输出 RDD 中提取值
Spark: Extract Values from Output RDD
我是 Spark 编程新手。当我从 RDD
得到以下输出时,我正在尝试从 RDD 中提取值
(CBI10006,(Some(Himanshu Vasani),None))
(CBI10004,(Some(Sonam Petro),Some(8500)))
(CBI10003,(None,Some(3000)))
我想将上面的值提取到下面的值
(CBI10006,Himanshu Vasani,'')
(CBI10004,Sonam Petro,8500)
(CBI10003,'',3000)
并且我尝试了如下的 FlatMap 方法
joined.flatMap{case(f1,f2) => (f1,(f2._1,f2._2))} but getting a below error
type mismatch;
found : (String, (Option[String], Option[String]))
required: TraversableOnce[?]
**joined.flatMap{case(f1,f2) => (f1,(f2._1,f2._2))}**
使用map()
:
val data = Seq(("CBI10006", (Some("Himanshu Vasani"), None)), ("CBI10004", (Some("Sonam Petro"), Some(8500))),
("CBI10003", (None, Some(3000))))
spark.sparkContext
.parallelize(data)
.map { case (x, y) => (x, y._1.getOrElse(""), y._2.getOrElse("")) }
.foreach(println)
// output:
// (CBI10006,Himanshu Vasani,)
// (CBI10004,Sonam Petro,8500)
// (CBI10003,,3000)
我是 Spark 编程新手。当我从 RDD
得到以下输出时,我正在尝试从 RDD 中提取值(CBI10006,(Some(Himanshu Vasani),None))
(CBI10004,(Some(Sonam Petro),Some(8500)))
(CBI10003,(None,Some(3000)))
我想将上面的值提取到下面的值
(CBI10006,Himanshu Vasani,'')
(CBI10004,Sonam Petro,8500)
(CBI10003,'',3000)
并且我尝试了如下的 FlatMap 方法
joined.flatMap{case(f1,f2) => (f1,(f2._1,f2._2))} but getting a below error
type mismatch;
found : (String, (Option[String], Option[String]))
required: TraversableOnce[?]
**joined.flatMap{case(f1,f2) => (f1,(f2._1,f2._2))}**
使用map()
:
val data = Seq(("CBI10006", (Some("Himanshu Vasani"), None)), ("CBI10004", (Some("Sonam Petro"), Some(8500))),
("CBI10003", (None, Some(3000))))
spark.sparkContext
.parallelize(data)
.map { case (x, y) => (x, y._1.getOrElse(""), y._2.getOrElse("")) }
.foreach(println)
// output:
// (CBI10006,Himanshu Vasani,)
// (CBI10004,Sonam Petro,8500)
// (CBI10003,,3000)