检查 spark 中是否存在嵌套列

checking if a nested column exists in spark

让我们有以下关注

case class SubRecord(x: Int)
case class ArrayElement(foo: String, bar: Int, vals: Array[Double])
case class Record(
  an_array: Array[Int], a_map: Map[String, String], 
  a_struct: SubRecord, an_array_of_structs: Array[ArrayElement])

val df = sc.parallelize(Seq(
  Record(Array(1, 2, 3), Map("foo" -> "bar"), SubRecord(1),
         Array(
           ArrayElement("foo", 1, Array(1.0, 2.0, 2.0)),
           ArrayElement("bar", 2, Array(3.0, 4.0, 5.0)))),
  Record(Array(4, 5, 6), Map("foz" -> "baz"), SubRecord(2),
         Array(ArrayElement("foz", 3, Array(5.0, 6.0)), 
               ArrayElement("baz", 4, Array(7.0, 8.0))))
)).toDF

如果列(路径)存在,我们在这里得到值,否则我们得到一个异常, 但是我想在这种情况下获得像 'NOT_FOUND' 这样的价值,这可能吗???

Expected output

我的做法,如下所示

Please share If anyone has a better solution ??

我实现了以下方法

import scala.util.{Failure, Success, Try}

def getValue(df: DataFrame, path: String)= {
    when(lit(Try(df(path)).isSuccess),
      Try(df(path)) match {
        case Success(_) => df(path)
        case Failure(_) => lit("")
      })
      .otherwise("NOT_FOUND").as(path)
  }
  • getValue() -> 检查数据框中的路径并根据需要给出值

OUTPUT

I found a better solution, we can just safeCol() in place of col() method

 def DataFrameSafeCol(df:DataFrame)(path: String): Column = {
      Try(df(path)) match {
        case Success(x) => x
        case Failure(_) => lit("NOT_FOUND")
      }
  }

Then we can use as below

val safeCol=DataFrameSafeCol(df)(_) // function currying
df.select(safeCol("column_name")).show