AnalysisException：CSV 数据源不支持 array<struct<

Question

我在工作，需要立即帮助我有一个镶木地板文件，我需要将其转换为 csv。你能帮帮我吗？

错误：

AnalysisException: CSV data source does not support array<struct<company:string,dateRange:string,description:string,location:string,title:string>> data type.

我从来没有使用过这种格式，所以我什至无法打印架构。抱歉

printshema:

root
 |-- _id: string (nullable = true)
 |-- Locale: string (nullable = true)
 |-- workExperience: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- company: string (nullable = true)
 |    |    |-- dateRange: string (nullable = true)
 |    |    |-- description: string (nullable = true)
 |    |    |-- location: string (nullable = true)
 |    |    |-- title: string (nullable = true)

Answer 1

您无法将包含 array/struct 类型的列的数据框保存为 CSV。您需要在写入前将列转换为字符串。

df.withColumn('workExperience', col('workExperience').cast('string')).write.csv('path')

Answer 2

Parquet 架构可以使用 explode:

展平

df=spark.read.parquet(...)
flattened_df = df.withColumn("tmp", F.explode("workExperience")) \
    .selectExpr("_id", "Locale", "tmp.*")
flattened_df.write.csv(...)

AnalysisException：CSV 数据源不支持 array<struct<

AnalysisException: CSV data source does not support array<struct<

apache-spark

parquet

pyspark