Spark-Scala Try Select 语句

Spark-Scala Try Select Statement

我正在尝试将 Try().getOrElse() 语句合并到我的 select Spark DataFrame 语句中。我正在进行的项目将应用于多个环境。但是,就仅一个字段的原始数据命名而言,每个环境都略有不同。我不想编写几个不同的函数来处理每个不同的字段。在 DataFrame select 语句中是否有一种优雅的方式来处理异常,如下所示?

val dfFilter = dfRaw
  .select(
   Try($"some.field.nameOption1).getOrElse($"some.field.nameOption2"),
   $"some.field.abc",
   $"some.field.def"
  )

dfFilter.show(33, false)

但是,我不断收到以下错误,这是有道理的,因为它不存在于此环境的原始数据中,但我希望 getOrElse 语句能够捕获该异常。

org.apache.spark.sql.AnalysisException: No such struct field nameOption1 in...

Scala Spark 中 select 语句有没有好的处理异常的方法?或者我需要为每种情况编写不同的函数吗?

val selectedColumns = if (dfRaw.columns.contains("some.field.nameOption1")) $"some.field.nameOption2" else $"some.field.nameOption2"

val dfFilter = dfRaw
  .select(selectedColumns, ...)

所以我在一年后重新审视这个问题。我相信这个解决方案实施起来要优雅得多。请让我知道其他人的想法:

// Generate a fake DataFrame
val df = Seq(
    ("1234", "A", "AAA"),
    ("1134", "B", "BBB"),
    ("2353", "C", "CCC")
    ).toDF("id", "name", "nameAlt")

// Extract the column names
val columns = df.columns

// Add a "new" column name that is NOT present in the above DataFrame
val columnsAdd = columns ++ Array("someNewColumn")

// Let's then "try" to select all of the columns
df.select(columnsAdd.flatMap(c => Try(df(c)).toOption): _*).show(false)

// Let's reduce the DF again...should yield the same results
val dfNew = df.select("id", "name")
dfNew.select(columnsAdd.flatMap(c => Try(dfNew(c)).toOption): _*).show(false)

// Results
columns: Array[String] = Array(id, name, nameAlt)
columnsAdd: Array[String] = Array(id, name, nameAlt, someNewColumn)
+----+----+-------+
|id  |name|nameAlt|
+----+----+-------+
|1234|A   |AAA    |
|1134|B   |BBB    |
|2353|C   |CCC    |
+----+----+-------+
dfNew: org.apache.spark.sql.DataFrame = [id: string, name: string]
+----+----+
|id  |name|
+----+----+
|1234|A   |
|1134|B   |
|2353|C   |
+----+----+