更改 Pyspark 中 Arraytype 列的任何字段的数据类型
Change the datatype of any fields of Arraytype column in Pyspark
我想更改字段“value”的数据类型,它位于数组类型列“readings”内。 “阅读”列有两个字段,“键”和“值”。
root
|-- name: string (nullable = true)
|-- languagesAtSchool: array (nullable = true)
| |-- element: string (containsNull = true)
|-- languagesAtSchool1: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- _id: integer (nullable = true)
|-- languagesAtWork: array (nullable = true)
| |-- element: string (containsNull = true)
|-- currentState: string (nullable = true)
|-- previousState: double (nullable = true)
|-- readings: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- value: integer (nullable = true)
| | |-- key: string (nullable = true)
预期架构是
root
|-- name: string (nullable = true)
|-- languagesAtSchool: array (nullable = true)
| |-- element: string (containsNull = true)
|-- languagesAtSchool1: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- _id: integer (nullable = true)
|-- languagesAtWork: array (nullable = true)
| |-- element: string (containsNull = true)
|-- currentState: string (nullable = true)
|-- previousState: double (nullable = true)
|-- readings: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- value: string (nullable = true)
| | |-- key: string (nullable = true)
使用高阶函数进行变换
选项 1;适用于你想删除一些 fields-name 必填字段指示,sql 表达式
df1=df.withColumn('readings', expr('transform(readings, x-> struct(cast(x.value as integer) value,x.key))'))
或
选项2;适用于不想命名结构中的字段,也可以是 sql 表达式
df1=df.withColumn('readings', expr('transform(readings, x-> struct(x,cast(x.value as integer)))'))
选项 3,适用于不想在结构中键入字段,不使用 sql 表达式
的情况
df.withColumn('readings', F.transform('readings', lambda x: x.withField('value', x['value'].cast('int'))))
root
|-- name: string (nullable = true)
|-- languagesAtSchool: array (nullable = true)
| |-- element: string (containsNull = true)
|-- languagesAtSchool1: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- id: integer (nullable = true)
|-- languagesAtWork: array (nullable = true)
| |-- element: string (containsNull = true)
|-- currentState: string (nullable = true)
|-- previousState: double (nullable = true)
|-- readings: array (nullable = true)
| |-- element: struct (containsNull = false)
| | |-- value: integer (nullable = true)
| | |-- key: string (nullable = true)
我想更改字段“value”的数据类型,它位于数组类型列“readings”内。 “阅读”列有两个字段,“键”和“值”。
root
|-- name: string (nullable = true)
|-- languagesAtSchool: array (nullable = true)
| |-- element: string (containsNull = true)
|-- languagesAtSchool1: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- _id: integer (nullable = true)
|-- languagesAtWork: array (nullable = true)
| |-- element: string (containsNull = true)
|-- currentState: string (nullable = true)
|-- previousState: double (nullable = true)
|-- readings: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- value: integer (nullable = true)
| | |-- key: string (nullable = true)
预期架构是
root
|-- name: string (nullable = true)
|-- languagesAtSchool: array (nullable = true)
| |-- element: string (containsNull = true)
|-- languagesAtSchool1: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- _id: integer (nullable = true)
|-- languagesAtWork: array (nullable = true)
| |-- element: string (containsNull = true)
|-- currentState: string (nullable = true)
|-- previousState: double (nullable = true)
|-- readings: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- value: string (nullable = true)
| | |-- key: string (nullable = true)
使用高阶函数进行变换
选项 1;适用于你想删除一些 fields-name 必填字段指示,sql 表达式
df1=df.withColumn('readings', expr('transform(readings, x-> struct(cast(x.value as integer) value,x.key))'))
或
选项2;适用于不想命名结构中的字段,也可以是 sql 表达式
df1=df.withColumn('readings', expr('transform(readings, x-> struct(x,cast(x.value as integer)))'))
选项 3,适用于不想在结构中键入字段,不使用 sql 表达式
的情况df.withColumn('readings', F.transform('readings', lambda x: x.withField('value', x['value'].cast('int'))))
root
|-- name: string (nullable = true)
|-- languagesAtSchool: array (nullable = true)
| |-- element: string (containsNull = true)
|-- languagesAtSchool1: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- id: integer (nullable = true)
|-- languagesAtWork: array (nullable = true)
| |-- element: string (containsNull = true)
|-- currentState: string (nullable = true)
|-- previousState: double (nullable = true)
|-- readings: array (nullable = true)
| |-- element: struct (containsNull = false)
| | |-- value: integer (nullable = true)
| | |-- key: string (nullable = true)