更改 Pyspark 中 Arraytype 列的任何字段的数据类型

Change the datatype of any fields of Arraytype column in Pyspark

我想更改字段“value”的数据类型,它位于数组类型列“readings”内。 “阅读”列有两个字段,“键”和“值”。

root
 |-- name: string (nullable = true)
 |-- languagesAtSchool: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- languagesAtSchool1: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- _id: integer (nullable = true)
 |-- languagesAtWork: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- currentState: string (nullable = true)
 |-- previousState: double (nullable = true)
 |-- readings: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- value: integer (nullable = true)
 |    |    |-- key: string (nullable = true)

预期架构是

   root
     |-- name: string (nullable = true)
     |-- languagesAtSchool: array (nullable = true)
     |    |-- element: string (containsNull = true)
     |-- languagesAtSchool1: array (nullable = true)
     |    |-- element: struct (containsNull = true)
     |    |    |-- _id: integer (nullable = true)
     |-- languagesAtWork: array (nullable = true)
     |    |-- element: string (containsNull = true)
     |-- currentState: string (nullable = true)
     |-- previousState: double (nullable = true)
     |-- readings: array (nullable = true)
     |    |-- element: struct (containsNull = true)
     |    |    |-- value: string (nullable = true)
     |    |    |-- key: string (nullable = true)

使用高阶函数进行变换

选项 1;适用于你想删除一些 fields-name 必填字段指示,sql 表达式

df1=df.withColumn('readings', expr('transform(readings, x-> struct(cast(x.value as integer) value,x.key))'))

选项2;适用于不想命名结构中的字段,也可以是 sql 表达式

df1=df.withColumn('readings', expr('transform(readings, x-> struct(x,cast(x.value as integer)))'))

选项 3,适用于不想在结构中键入字段,不使用 sql 表达式

的情况
df.withColumn('readings', F.transform('readings', lambda x: x.withField('value', x['value'].cast('int'))))


root
 |-- name: string (nullable = true)
 |-- languagesAtSchool: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- languagesAtSchool1: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- id: integer (nullable = true)
 |-- languagesAtWork: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- currentState: string (nullable = true)
 |-- previousState: double (nullable = true)
 |-- readings: array (nullable = true)
 |    |-- element: struct (containsNull = false)
 |    |    |-- value: integer (nullable = true)
 |    |    |-- key: string (nullable = true)