检查 spark 中是否存在嵌套列
checking if a nested column exists in spark
让我们有以下关注
case class SubRecord(x: Int)
case class ArrayElement(foo: String, bar: Int, vals: Array[Double])
case class Record(
an_array: Array[Int], a_map: Map[String, String],
a_struct: SubRecord, an_array_of_structs: Array[ArrayElement])
val df = sc.parallelize(Seq(
Record(Array(1, 2, 3), Map("foo" -> "bar"), SubRecord(1),
Array(
ArrayElement("foo", 1, Array(1.0, 2.0, 2.0)),
ArrayElement("bar", 2, Array(3.0, 4.0, 5.0)))),
Record(Array(4, 5, 6), Map("foz" -> "baz"), SubRecord(2),
Array(ArrayElement("foz", 3, Array(5.0, 6.0)),
ArrayElement("baz", 4, Array(7.0, 8.0))))
)).toDF
如果列(路径)存在,我们在这里得到值,否则我们得到一个异常,
但是我想在这种情况下获得像 'NOT_FOUND' 这样的价值,这可能吗???
Expected output
我的做法,如下所示
Please share If anyone has a better solution ??
我实现了以下方法
import scala.util.{Failure, Success, Try}
def getValue(df: DataFrame, path: String)= {
when(lit(Try(df(path)).isSuccess),
Try(df(path)) match {
case Success(_) => df(path)
case Failure(_) => lit("")
})
.otherwise("NOT_FOUND").as(path)
}
- getValue() -> 检查数据框中的路径并根据需要给出值
OUTPUT
I found a better solution, we can just safeCol() in place of col() method
def DataFrameSafeCol(df:DataFrame)(path: String): Column = {
Try(df(path)) match {
case Success(x) => x
case Failure(_) => lit("NOT_FOUND")
}
}
Then we can use as below
val safeCol=DataFrameSafeCol(df)(_) // function currying
df.select(safeCol("column_name")).show
让我们有以下关注
case class SubRecord(x: Int)
case class ArrayElement(foo: String, bar: Int, vals: Array[Double])
case class Record(
an_array: Array[Int], a_map: Map[String, String],
a_struct: SubRecord, an_array_of_structs: Array[ArrayElement])
val df = sc.parallelize(Seq(
Record(Array(1, 2, 3), Map("foo" -> "bar"), SubRecord(1),
Array(
ArrayElement("foo", 1, Array(1.0, 2.0, 2.0)),
ArrayElement("bar", 2, Array(3.0, 4.0, 5.0)))),
Record(Array(4, 5, 6), Map("foz" -> "baz"), SubRecord(2),
Array(ArrayElement("foz", 3, Array(5.0, 6.0)),
ArrayElement("baz", 4, Array(7.0, 8.0))))
)).toDF
如果列(路径)存在,我们在这里得到值,否则我们得到一个异常, 但是我想在这种情况下获得像 'NOT_FOUND' 这样的价值,这可能吗???
Expected output
我的做法,如下所示
Please share If anyone has a better solution ??
我实现了以下方法
import scala.util.{Failure, Success, Try}
def getValue(df: DataFrame, path: String)= {
when(lit(Try(df(path)).isSuccess),
Try(df(path)) match {
case Success(_) => df(path)
case Failure(_) => lit("")
})
.otherwise("NOT_FOUND").as(path)
}
- getValue() -> 检查数据框中的路径并根据需要给出值
OUTPUT
I found a better solution, we can just safeCol() in place of col() method
def DataFrameSafeCol(df:DataFrame)(path: String): Column = {
Try(df(path)) match {
case Success(x) => x
case Failure(_) => lit("NOT_FOUND")
}
}
Then we can use as below
val safeCol=DataFrameSafeCol(df)(_) // function currying
df.select(safeCol("column_name")).show