SQLcontext 将 String 字段更改为 Long：Spark 1.5

Question

我已将我的记录保存为 parquet 格式并使用 Spark1.5。但是当我尝试获取列时它会抛出异常

java.lang.ClassCastException: java.lang.Long cannot be cast to org.apache.spark.unsafe.types.UTF8String.

这个文件在写parquet的时候保存为String。所以这是相同的示例代码和输出..

logger.info("troubling thing is ::" + 
    sqlContext.sql(fileSelectQuery).schema().toString()); 

DataFrame df= sqlContext.sql(fileSelectQuery); 

JavaRDD<Row> rdd2 = df.toJavaRDD();

*代码中的第一行（Logger）打印如下：

troubling thing is ::StructType(StructField(batch_id,StringType,true))*

但紧接着异常出现了。

知道为什么将字段视为 Long 吗？（是的，关于列的一个独特之处在于它始终是一个数字，例如时间戳）。

感谢任何帮助。

Answer 1

所以我找到了问题的解决方案。

我没有开始使用 SCALA。
进行了更多的搜索和阅读，发现了这个：

http://spark.apache.org/docs/latest/sql-programming-guide.html#partition-discovery

Notice that the data types of the partitioning columns are automatically inferred. Currently, numeric data types and string type are supported. Sometimes users may not want to automatically infer the data types of the partitioning columns. For these use cases, the automatic type inference can be configured by spark.sql.sources.partitionColumnTypeInference.enabled, which is default to true. When type inference is disabled, string type will be used for the partitioning columns.

更改上述配置后，问题得到了很好的解决。 :)

SQLcontext 将 String 字段更改为 Long：Spark 1.5

SQLcontext changing String field to Long : Spark 1.5

java

apache-spark

parquet

apache-spark-sql