Pyspark 将嵌入式结构全部展平到同一级别
Pyspark flatten embedded structs all into same level
有没有一种简单的方法可以制作类似从上到下的图片,其中所有列以非嵌套的方式彼此相邻,同一级别?
您可以通过从内部结构添加字段来更新 source
结构。像这样:
# get all fields of source struct except the inner struct e_struct
source_cols = [col(f"source.{c}") for c in df.select(col("source.*")).columns if c != "e_struct"]
# get all fields of the inner struct e_struct
e_struct_cols = [col(f"source.e_struct.{c}") for c in df.select(col("source.e_struct.*")).columns]
# combine them
new_struct_cols = source_cols + e_struct_cols
# update source column
df = df.withColumn("source", struct(*new_struct_cols))
有没有一种简单的方法可以制作类似从上到下的图片,其中所有列以非嵌套的方式彼此相邻,同一级别?
您可以通过从内部结构添加字段来更新 source
结构。像这样:
# get all fields of source struct except the inner struct e_struct
source_cols = [col(f"source.{c}") for c in df.select(col("source.*")).columns if c != "e_struct"]
# get all fields of the inner struct e_struct
e_struct_cols = [col(f"source.e_struct.{c}") for c in df.select(col("source.e_struct.*")).columns]
# combine them
new_struct_cols = source_cols + e_struct_cols
# update source column
df = df.withColumn("source", struct(*new_struct_cols))