Apache Spark Parquet:无法构建空组
Apache Spark Parquet: Cannot build an empty group
我用的是 Apache Spark 2.1.1(用的是 2.1.0,还是一样的,今天换了)。
我有一个数据集:
root
|-- muons: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- reco::Candidate: struct (nullable = true)
| | |-- qx3_: integer (nullable = true)
| | |-- pt_: float (nullable = true)
| | |-- eta_: float (nullable = true)
| | |-- phi_: float (nullable = true)
| | |-- mass_: float (nullable = true)
| | |-- vertex_: struct (nullable = true)
| | | |-- fCoordinates: struct (nullable = true)
| | | | |-- fX: float (nullable = true)
| | | | |-- fY: float (nullable = true)
| | | | |-- fZ: float (nullable = true)
| | |-- pdgId_: integer (nullable = true)
| | |-- status_: integer (nullable = true)
| | |-- cachePolarFixed_: struct (nullable = true)
| | |-- cacheCartesianFixed_: struct (nullable = true)
如您所见,此架构中有 3 个空结构。我 100% 知道我可以 read/manipulate/do 无论如何。但是,当我尝试在 parquet 中写入磁盘时,出现以下异常:
dsReduced.write.format("parquet").save(outputPathName):
java.lang.IllegalStateException: Cannot build an empty group
at org.apache.parquet.Preconditions.checkState(Preconditions.java:91)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:622)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:497)
at org.apache.parquet.schema.Types$Builder.named(Types.java:286)
at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:535)
at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321)
at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField.apply(ParquetSchemaConverter.scala:534)
at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField.apply(ParquetSchemaConverter.scala:533)
所以,基本上我想了解它是错误还是预期行为???我还假设它与空结构有关。任何帮助将不胜感激!
更新:我已经快速创建了精简版,并且可以正常使用!任何见解都会非常有帮助!
VK
Parquet 不写空结构:
了解更多信息 - 请参阅此处 https://issues.apache.org/jira/browse/SPARK-20593
VK
我用的是 Apache Spark 2.1.1(用的是 2.1.0,还是一样的,今天换了)。 我有一个数据集:
root
|-- muons: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- reco::Candidate: struct (nullable = true)
| | |-- qx3_: integer (nullable = true)
| | |-- pt_: float (nullable = true)
| | |-- eta_: float (nullable = true)
| | |-- phi_: float (nullable = true)
| | |-- mass_: float (nullable = true)
| | |-- vertex_: struct (nullable = true)
| | | |-- fCoordinates: struct (nullable = true)
| | | | |-- fX: float (nullable = true)
| | | | |-- fY: float (nullable = true)
| | | | |-- fZ: float (nullable = true)
| | |-- pdgId_: integer (nullable = true)
| | |-- status_: integer (nullable = true)
| | |-- cachePolarFixed_: struct (nullable = true)
| | |-- cacheCartesianFixed_: struct (nullable = true)
如您所见,此架构中有 3 个空结构。我 100% 知道我可以 read/manipulate/do 无论如何。但是,当我尝试在 parquet 中写入磁盘时,出现以下异常:
dsReduced.write.format("parquet").save(outputPathName):
java.lang.IllegalStateException: Cannot build an empty group
at org.apache.parquet.Preconditions.checkState(Preconditions.java:91)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:622)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:497)
at org.apache.parquet.schema.Types$Builder.named(Types.java:286)
at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:535)
at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321)
at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField.apply(ParquetSchemaConverter.scala:534)
at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField.apply(ParquetSchemaConverter.scala:533)
所以,基本上我想了解它是错误还是预期行为???我还假设它与空结构有关。任何帮助将不胜感激!
更新:我已经快速创建了精简版,并且可以正常使用!任何见解都会非常有帮助!
VK
Parquet 不写空结构:
了解更多信息 - 请参阅此处 https://issues.apache.org/jira/browse/SPARK-20593
VK