通过 spark 从 mariadb 读取日期和日期时间列时出错
Error while reading date and datetime column from mariadb via spark
我正在从具有日期和日期时间字段的 spark 读取 mariadb table。 Spark 在读取时抛出错误。
下面是 mariadb 的架构 table:
读取 mariadb 的 Spark 代码 table:
val df = spark.read.format("jdbc").option("driver", "org.mariadb.jdbc.Driver").option("url", "jdbc:mariadb://xxxx:xxxx/db").option("user", "user").option("password", "password").option("dbtable", "select * from test_ankur").load()
df.select("ptime").show()
日期字段出现以下错误:
Caused by: java.sql.SQLTransientConnectionException: Could not get object as Date : Unparseable date: "ptime"
at org.mariadb.jdbc.internal.util.exceptions.ExceptionFactory.createException(ExceptionFactory.java:79)
at org.mariadb.jdbc.internal.util.exceptions.ExceptionFactory.create(ExceptionFactory.java:183)
at org.mariadb.jdbc.internal.com.read.resultset.rowprotocol.TextRowProtocol.getInternalDate(TextRowProtocol.java:546)
at org.mariadb.jdbc.internal.com.read.resultset.SelectResultSet.getDate(SelectResultSet.java:1065)
日期时间字段出现以下错误:
Caused by: java.sql.SQLException: cannot parse data in timestamp string 'start_date'
at org.mariadb.jdbc.internal.com.read.resultset.rowprotocol.TextRowProtocol.getInternalTimestamp(TextRowProtocol.java:645)
at org.mariadb.jdbc.internal.com.read.resultset.SelectResultSet.getTimestamp(SelectResultSet.java:1125)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter.apply(JdbcUtils.scala:452)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter.apply(JdbcUtils.scala:451)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon.getNext(JdbcUtils.scala:356)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon.getNext(JdbcUtils.scala:338)
我在尝试获取日期类型值时遇到了类似的问题。我在 URL 中添加了“nullCatalogMeansCurrent=true”,它起作用了。
url "jdbc:mariadb://xxxx:3306/datalake_test?useSSL=FALSE&nullCatalogMeansCurrent=true"
我通过将连接字符串更改为 mysql:
来实现此功能
jdbc:mysql://xxxx:xxxx/db
根据 mariadb 文档MariaDB Column Store with Spark
Currently Spark does not correctly recognize mariadb specific jdbc connect strings and so the jdbc:mysql syntax must be used.
我正在从具有日期和日期时间字段的 spark 读取 mariadb table。 Spark 在读取时抛出错误。
下面是 mariadb 的架构 table:
读取 mariadb 的 Spark 代码 table:
val df = spark.read.format("jdbc").option("driver", "org.mariadb.jdbc.Driver").option("url", "jdbc:mariadb://xxxx:xxxx/db").option("user", "user").option("password", "password").option("dbtable", "select * from test_ankur").load()
df.select("ptime").show()
日期字段出现以下错误:
Caused by: java.sql.SQLTransientConnectionException: Could not get object as Date : Unparseable date: "ptime"
at org.mariadb.jdbc.internal.util.exceptions.ExceptionFactory.createException(ExceptionFactory.java:79)
at org.mariadb.jdbc.internal.util.exceptions.ExceptionFactory.create(ExceptionFactory.java:183)
at org.mariadb.jdbc.internal.com.read.resultset.rowprotocol.TextRowProtocol.getInternalDate(TextRowProtocol.java:546)
at org.mariadb.jdbc.internal.com.read.resultset.SelectResultSet.getDate(SelectResultSet.java:1065)
日期时间字段出现以下错误:
Caused by: java.sql.SQLException: cannot parse data in timestamp string 'start_date'
at org.mariadb.jdbc.internal.com.read.resultset.rowprotocol.TextRowProtocol.getInternalTimestamp(TextRowProtocol.java:645)
at org.mariadb.jdbc.internal.com.read.resultset.SelectResultSet.getTimestamp(SelectResultSet.java:1125)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter.apply(JdbcUtils.scala:452)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter.apply(JdbcUtils.scala:451)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon.getNext(JdbcUtils.scala:356)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon.getNext(JdbcUtils.scala:338)
我在尝试获取日期类型值时遇到了类似的问题。我在 URL 中添加了“nullCatalogMeansCurrent=true”,它起作用了。
url "jdbc:mariadb://xxxx:3306/datalake_test?useSSL=FALSE&nullCatalogMeansCurrent=true"
我通过将连接字符串更改为 mysql:
来实现此功能jdbc:mysql://xxxx:xxxx/db
根据 mariadb 文档MariaDB Column Store with Spark
Currently Spark does not correctly recognize mariadb specific jdbc connect strings and so the jdbc:mysql syntax must be used.