Spark 无法识别 SQL 查询中的列名,但可以将其输出到数据集
Spark doesn't recognize the column name in SQL query while can output it to a dataset
我正在像这样应用 SQL 查询:
s"SELECT * FROM my_table_joined WHERE (timestamp > '2022-01-23' and writetime is not null and acceptTimestamp is not null)"
我收到了这样的错误消息。
warning: there was one deprecation warning (since 2.0.0); for details, enable `:setting -deprecation' or `:replay -deprecation'
org.postgresql.util.PSQLException: ERROR: column "accepttimestamp" does not exist
Hint: Perhaps you meant to reference the column "mf_joined.acceptTimestamp".
Position: 103
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2497)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2233)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:310)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:446)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:370)
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:149)
at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:108)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:61)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:226)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:344)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:297)
at org.apache.spark.sql.DataFrameReader.$anonfun$load(DataFrameReader.scala:286)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:286)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:221)
at $$$e76229fa87b6865de321c5274e52c2f9$$$$w$getDFFromJdbcSource(<console>:1133)
... 326 elided
如果我这样省略 acceptTimestamp
:
s"SELECT * FROM my_table_joined WHERE (timestamp > '2022-01-23' and writetime is not null)"
我得到的数据如下:
+-------------------+----------+----+------------------+-----------------+---+-----+------+----------+---------------+-------+-----------------------+----------+---------+-------------+------------+---------------+---------+-----+-------------------+-----------------------+---------------+--------------+-------------+-------------------+-------------------+---+---+------------------+-----+----+----+------------------+---+
|timestamp |flags |type|lon |lat |alt|speed|course|satellites|digital_twin_id|unit_id|unit_ts |name |unit_type|measure_units|access_level|uid |placement|stale|start |writetime |acceptTimestamp|delayWindowEnd|DiffInSeconds|time |hour |max|min|mean |count|max2|min2|mean2 |rnb|
+-------------------+----------+----+------------------+-----------------+---+-----+------+----------+---------------+-------+-----------------------+----------+---------+-------------+------------+---------------+---------+-----+-------------------+-----------------------+---------------+--------------+-------------+-------------------+-------------------+---+---+------------------+-----+----+----+------------------+---+
请注意 acceptTimestamp
在这里!
那么我应该如何处理查询中的这一列以将其考虑在内?
从异常来看,这似乎与 Postgres 有关,而不是 Spark。如果您查看收到的错误消息,列名将折叠为小写 accepttimestamp
,而在您的查询中 T
为大写 acceptTimestamp
.
要使 Postgres 的列名 case-sensitive,您需要使用 double-quotes。试试这个:
val query = s"""SELECT * FROM my_table_joined
WHERE timestamp > '2022-01-23'
and writetime is not null
and "acceptTimestamp" is not null"""
我正在像这样应用 SQL 查询:
s"SELECT * FROM my_table_joined WHERE (timestamp > '2022-01-23' and writetime is not null and acceptTimestamp is not null)"
我收到了这样的错误消息。
warning: there was one deprecation warning (since 2.0.0); for details, enable `:setting -deprecation' or `:replay -deprecation'
org.postgresql.util.PSQLException: ERROR: column "accepttimestamp" does not exist
Hint: Perhaps you meant to reference the column "mf_joined.acceptTimestamp".
Position: 103
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2497)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2233)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:310)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:446)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:370)
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:149)
at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:108)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:61)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:226)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:344)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:297)
at org.apache.spark.sql.DataFrameReader.$anonfun$load(DataFrameReader.scala:286)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:286)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:221)
at $$$e76229fa87b6865de321c5274e52c2f9$$$$w$getDFFromJdbcSource(<console>:1133)
... 326 elided
如果我这样省略 acceptTimestamp
:
s"SELECT * FROM my_table_joined WHERE (timestamp > '2022-01-23' and writetime is not null)"
我得到的数据如下:
+-------------------+----------+----+------------------+-----------------+---+-----+------+----------+---------------+-------+-----------------------+----------+---------+-------------+------------+---------------+---------+-----+-------------------+-----------------------+---------------+--------------+-------------+-------------------+-------------------+---+---+------------------+-----+----+----+------------------+---+
|timestamp |flags |type|lon |lat |alt|speed|course|satellites|digital_twin_id|unit_id|unit_ts |name |unit_type|measure_units|access_level|uid |placement|stale|start |writetime |acceptTimestamp|delayWindowEnd|DiffInSeconds|time |hour |max|min|mean |count|max2|min2|mean2 |rnb|
+-------------------+----------+----+------------------+-----------------+---+-----+------+----------+---------------+-------+-----------------------+----------+---------+-------------+------------+---------------+---------+-----+-------------------+-----------------------+---------------+--------------+-------------+-------------------+-------------------+---+---+------------------+-----+----+----+------------------+---+
请注意 acceptTimestamp
在这里!
那么我应该如何处理查询中的这一列以将其考虑在内?
从异常来看,这似乎与 Postgres 有关,而不是 Spark。如果您查看收到的错误消息,列名将折叠为小写 accepttimestamp
,而在您的查询中 T
为大写 acceptTimestamp
.
要使 Postgres 的列名 case-sensitive,您需要使用 double-quotes。试试这个:
val query = s"""SELECT * FROM my_table_joined
WHERE timestamp > '2022-01-23'
and writetime is not null
and "acceptTimestamp" is not null"""