AnalysisException:CSV 数据源不支持 array<struct<
AnalysisException: CSV data source does not support array<struct<
我在工作,需要立即帮助
我有一个镶木地板文件,我需要将其转换为 csv。你能帮帮我吗?
错误:
AnalysisException: CSV data source does not support array<struct<company:string,dateRange:string,description:string,location:string,title:string>> data type.
我从来没有使用过这种格式,所以我什至无法打印架构。抱歉
printshema:
root
|-- _id: string (nullable = true)
|-- Locale: string (nullable = true)
|-- workExperience: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- company: string (nullable = true)
| | |-- dateRange: string (nullable = true)
| | |-- description: string (nullable = true)
| | |-- location: string (nullable = true)
| | |-- title: string (nullable = true)
您无法将包含 array/struct 类型的列的数据框保存为 CSV。您需要在写入前将列转换为字符串。
df.withColumn('workExperience', col('workExperience').cast('string')).write.csv('path')
Parquet 架构可以使用 explode:
展平
df=spark.read.parquet(...)
flattened_df = df.withColumn("tmp", F.explode("workExperience")) \
.selectExpr("_id", "Locale", "tmp.*")
flattened_df.write.csv(...)
我在工作,需要立即帮助 我有一个镶木地板文件,我需要将其转换为 csv。你能帮帮我吗?
错误:
AnalysisException: CSV data source does not support array<struct<company:string,dateRange:string,description:string,location:string,title:string>> data type.
我从来没有使用过这种格式,所以我什至无法打印架构。抱歉
printshema:
root
|-- _id: string (nullable = true)
|-- Locale: string (nullable = true)
|-- workExperience: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- company: string (nullable = true)
| | |-- dateRange: string (nullable = true)
| | |-- description: string (nullable = true)
| | |-- location: string (nullable = true)
| | |-- title: string (nullable = true)
您无法将包含 array/struct 类型的列的数据框保存为 CSV。您需要在写入前将列转换为字符串。
df.withColumn('workExperience', col('workExperience').cast('string')).write.csv('path')
Parquet 架构可以使用 explode:
展平df=spark.read.parquet(...)
flattened_df = df.withColumn("tmp", F.explode("workExperience")) \
.selectExpr("_id", "Locale", "tmp.*")
flattened_df.write.csv(...)