Pyspark 将嵌入式结构全部展平到同一级别

Pyspark flatten embedded structs all into same level

有没有一种简单的方法可以制作类似从上到下的图片,其中所有列以非嵌套的方式彼此相邻,同一级别?

您可以通过从内部结构添加字段来更新 source 结构。像这样:

# get all fields of source struct except the inner struct e_struct
source_cols = [col(f"source.{c}") for c in df.select(col("source.*")).columns if c != "e_struct"]

# get all fields of the inner struct e_struct
e_struct_cols = [col(f"source.e_struct.{c}") for c in df.select(col("source.e_struct.*")).columns]

# combine them
new_struct_cols = source_cols + e_struct_cols

# update source column
df = df.withColumn("source", struct(*new_struct_cols))