spark 数据帧结构中的类型更改
Type change in spark dataframe struct
我有以下架构:
root
|-- Id: long (nullable = true)
|-- element: struct (containsNull = true)
| |-- Amount: double (nullable = true)
| |-- Currency: string (nullable = true)
我想将金额的类型更改为整数。它不适用于 withColumn
,因为类型保持不变:
df.withColumn("element.Amount", $"element.Amount".cast(sql.types.IntegerType))
如何更改结构中的列类型?
如果无法解决源码中的问题,可以投:
case class Amount(amount: Double, currency: String)
case class Row(id: Long, element: Amount)
val df = Seq(Row(1L, Amount(0.96, "EUR"))).toDF
val dfCasted = df.withColumn(
"element", $"element".cast("struct<amount: integer, currency: string>")
)
dfCasted.show
// +---+--------+
// | id| element|
// +---+--------+
// | 1|[0, EUR]|
// +---+--------+
dfCasted.printSchema
// root
// |-- id: long (nullable = false)
// |-- element: struct (nullable = true)
// | |-- amount: integer (nullable = true)
// | |-- currency: string (nullable = true)
dfCasted.printSchema
在简单的情况下,您可以尝试重建树:
import org.apache.spark.sql.functions._
dfCasted.withColumn(
"element",
struct($"element.amount".cast("integer"), $"element.currency")
)
// org.apache.spark.sql.DataFrame = [id: bigint, element: struct<col1: int, currency: string>]
但它不能扩展复杂的树。
我有以下架构:
root
|-- Id: long (nullable = true)
|-- element: struct (containsNull = true)
| |-- Amount: double (nullable = true)
| |-- Currency: string (nullable = true)
我想将金额的类型更改为整数。它不适用于 withColumn
,因为类型保持不变:
df.withColumn("element.Amount", $"element.Amount".cast(sql.types.IntegerType))
如何更改结构中的列类型?
如果无法解决源码中的问题,可以投:
case class Amount(amount: Double, currency: String)
case class Row(id: Long, element: Amount)
val df = Seq(Row(1L, Amount(0.96, "EUR"))).toDF
val dfCasted = df.withColumn(
"element", $"element".cast("struct<amount: integer, currency: string>")
)
dfCasted.show
// +---+--------+
// | id| element|
// +---+--------+
// | 1|[0, EUR]|
// +---+--------+
dfCasted.printSchema
// root
// |-- id: long (nullable = false)
// |-- element: struct (nullable = true)
// | |-- amount: integer (nullable = true)
// | |-- currency: string (nullable = true)
dfCasted.printSchema
在简单的情况下,您可以尝试重建树:
import org.apache.spark.sql.functions._
dfCasted.withColumn(
"element",
struct($"element.amount".cast("integer"), $"element.currency")
)
// org.apache.spark.sql.DataFrame = [id: bigint, element: struct<col1: int, currency: string>]
但它不能扩展复杂的树。