从 Mapr DB table 推断出 InvalidType 的 Spark 数据帧时间戳列
Spark dataframe Timestamp column inferred as of InvalidType from Mapr DB table
我使用 Spark 从 MapR DB 读取 table。但是时间戳列被推断为 InvalidType。当您从 Mapr 数据库读取数据时,也没有设置模式的选项。
root
|-- Name: string (nullable = true)
|-- dt: struct (nullable = true)
| |-- InvalidType: string (nullable = true)
我试图将列转换为时间戳,但出现以下异常。
val df = spark.loadFromMapRDB("path")
df.withColumn("dt1", $"dt" ("InvalidType").cast(TimestampType))
.drop("dt")
df.show(5, false)
com.mapr.db.spark.exceptions.SchemaMappingException: Schema cannot be
inferred for the column {dt}
at com.mapr.db.spark.sql.utils.MapRSqlUtils$.convertField(MapRSqlUtils.scala:250)
at com.mapr.db.spark.sql.utils.MapRSqlUtils$.convertObject(MapRSqlUtils.scala:64)
at com.mapr.db.spark.sql.utils.MapRSqlUtils$.convertRootField(MapRSqlUtils.scala:48)
at com.mapr.db.spark.sql.utils.MapRSqlUtils$$anonfun$documentsToRow.apply(MapRSqlUtils.scala:34)
at com.mapr.db.spark.sql.utils.MapRSqlUtils$$anonfun$documentsToRow.apply(MapRSqlUtils.scala:33)
at scala.collection.Iterator$$anon.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon.hasNext(Iterator.scala:408)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$$anon.hasNext(WholeStageCodegenExec.scala:395)
at org.apache.spark.sql.execution.SparkPlan$$anonfun.apply(SparkPlan.scala:234)
at org.apache.spark.sql.execution.SparkPlan$$anonfun.apply(SparkPlan.scala:228)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$$anonfun$apply.apply(RDD.scala:827)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$$anonfun$apply.apply(RDD.scala:827)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
我们将不胜感激。
如果您知道 table 的架构。您可以创建自己的案例 class 来定义 table 的架构,然后使用此案例 class.
加载 table
完成此 link 从 MapR 数据库加载数据作为 Apache Spark 数据集
并且还要检查 MapRDB 中的 table 该特定列是否具有有效架构
我使用 Spark 从 MapR DB 读取 table。但是时间戳列被推断为 InvalidType。当您从 Mapr 数据库读取数据时,也没有设置模式的选项。
root
|-- Name: string (nullable = true)
|-- dt: struct (nullable = true)
| |-- InvalidType: string (nullable = true)
我试图将列转换为时间戳,但出现以下异常。
val df = spark.loadFromMapRDB("path")
df.withColumn("dt1", $"dt" ("InvalidType").cast(TimestampType))
.drop("dt")
df.show(5, false)
com.mapr.db.spark.exceptions.SchemaMappingException: Schema cannot be inferred for the column {dt} at com.mapr.db.spark.sql.utils.MapRSqlUtils$.convertField(MapRSqlUtils.scala:250) at com.mapr.db.spark.sql.utils.MapRSqlUtils$.convertObject(MapRSqlUtils.scala:64) at com.mapr.db.spark.sql.utils.MapRSqlUtils$.convertRootField(MapRSqlUtils.scala:48) at com.mapr.db.spark.sql.utils.MapRSqlUtils$$anonfun$documentsToRow.apply(MapRSqlUtils.scala:34) at com.mapr.db.spark.sql.utils.MapRSqlUtils$$anonfun$documentsToRow.apply(MapRSqlUtils.scala:33) at scala.collection.Iterator$$anon.nextCur(Iterator.scala:434) at scala.collection.Iterator$$anon.hasNext(Iterator.scala:440) at scala.collection.Iterator$$anon.hasNext(Iterator.scala:408) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$$anon.hasNext(WholeStageCodegenExec.scala:395) at org.apache.spark.sql.execution.SparkPlan$$anonfun.apply(SparkPlan.scala:234) at org.apache.spark.sql.execution.SparkPlan$$anonfun.apply(SparkPlan.scala:228) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$$anonfun$apply.apply(RDD.scala:827) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$$anonfun$apply.apply(RDD.scala:827) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
我们将不胜感激。
如果您知道 table 的架构。您可以创建自己的案例 class 来定义 table 的架构,然后使用此案例 class.
加载 table完成此 link 从 MapR 数据库加载数据作为 Apache Spark 数据集
并且还要检查 MapRDB 中的 table 该特定列是否具有有效架构