如何在没有位置的情况下创建外部 Hive table?
How to create external Hive table without location?
我在集群模式下的 yarn 集群上有一个 spark sql 2.1.1 作业,我想在其中创建一个空的外部配置单元 table(具有位置的分区将在稍后的步骤中添加).
CREATE EXTERNAL TABLE IF NOT EXISTS new_table (id BIGINT, StartTime TIMESTAMP, EndTime TIMESTAMP) PARTITIONED BY (year INT, month INT, day INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
当我 运行 作业时出现错误:
CREATE EXTERNAL TABLE must be accompanied by LOCATION
但是当我 运行 在 Hue 上的 Hive Editor 上进行相同的查询时,运行 就很好了。我试图在 SparkSQL 2.1.1 文档中找到答案,但一无所获。
有谁知道为什么 Spark SQL 对查询更严格?
TL;DR EXTERNAL
没有 LOCATION
is not allowed.
最终答案在 Spark SQL 的语法定义文件 SqlBase.g4.
你可以找到 CREATE EXTERNAL TABLE
的定义为 createTableHeader:
CREATE TEMPORARY? EXTERNAL? TABLE (IF NOT EXISTS)? tableIdentifier
此定义用于支持的 SQL statements.
除非我记错了 locationSpec
是可选的。这是根据 ANTLR 语法。代码可能另有决定,而且看起来确实如此。
scala> spark.version
res4: String = 2.3.0-SNAPSHOT
val q = "CREATE EXTERNAL TABLE IF NOT EXISTS new_table (id BIGINT, StartTime TIMESTAMP, EndTime TIMESTAMP) PARTITIONED BY (year INT, month INT, day INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'"
scala> sql(q)
org.apache.spark.sql.catalyst.parser.ParseException:
Operation not allowed: CREATE EXTERNAL TABLE must be accompanied by LOCATION(line 1, pos 0)
== SQL ==
CREATE EXTERNAL TABLE IF NOT EXISTS new_table (id BIGINT, StartTime TIMESTAMP, EndTime TIMESTAMP) PARTITIONED BY (year INT, month INT, day INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '
^^^
at org.apache.spark.sql.catalyst.parser.ParserUtils$.operationNotAllowed(ParserUtils.scala:39)
at org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateHiveTable.apply(SparkSqlParser.scala:1096)
at org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateHiveTable.apply(SparkSqlParser.scala:1064)
at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99)
at org.apache.spark.sql.execution.SparkSqlAstBuilder.visitCreateHiveTable(SparkSqlParser.scala:1064)
at org.apache.spark.sql.execution.SparkSqlAstBuilder.visitCreateHiveTable(SparkSqlParser.scala:55)
at org.apache.spark.sql.catalyst.parser.SqlBaseParser$CreateHiveTableContext.accept(SqlBaseParser.java:1124)
at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:42)
at org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement.apply(AstBuilder.scala:71)
at org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement.apply(AstBuilder.scala:71)
at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99)
at org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:70)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan.apply(ParseDriver.scala:69)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan.apply(ParseDriver.scala:68)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:97)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:68)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:623)
... 48 elided
默认 SparkSqlParser
(astBuilder
为 SparkSqlAstBuilder
)具有导致异常的 following assertion:
if (external && location.isEmpty) {
operationNotAllowed("CREATE EXTERNAL TABLE must be accompanied by LOCATION", ctx)
我会考虑在 Spark's JIRA if you think that the case should be allowed. See SPARK-2825 中报告一个问题以获得支持:
CREATE EXTERNAL TABLE already works as far as I know and should have the same semantics as Hive.
我在集群模式下的 yarn 集群上有一个 spark sql 2.1.1 作业,我想在其中创建一个空的外部配置单元 table(具有位置的分区将在稍后的步骤中添加).
CREATE EXTERNAL TABLE IF NOT EXISTS new_table (id BIGINT, StartTime TIMESTAMP, EndTime TIMESTAMP) PARTITIONED BY (year INT, month INT, day INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
当我 运行 作业时出现错误:
CREATE EXTERNAL TABLE must be accompanied by LOCATION
但是当我 运行 在 Hue 上的 Hive Editor 上进行相同的查询时,运行 就很好了。我试图在 SparkSQL 2.1.1 文档中找到答案,但一无所获。
有谁知道为什么 Spark SQL 对查询更严格?
TL;DR EXTERNAL
没有 LOCATION
is not allowed.
最终答案在 Spark SQL 的语法定义文件 SqlBase.g4.
你可以找到 CREATE EXTERNAL TABLE
的定义为 createTableHeader:
CREATE TEMPORARY? EXTERNAL? TABLE (IF NOT EXISTS)? tableIdentifier
此定义用于支持的 SQL statements.
除非我记错了 locationSpec
是可选的。这是根据 ANTLR 语法。代码可能另有决定,而且看起来确实如此。
scala> spark.version
res4: String = 2.3.0-SNAPSHOT
val q = "CREATE EXTERNAL TABLE IF NOT EXISTS new_table (id BIGINT, StartTime TIMESTAMP, EndTime TIMESTAMP) PARTITIONED BY (year INT, month INT, day INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'"
scala> sql(q)
org.apache.spark.sql.catalyst.parser.ParseException:
Operation not allowed: CREATE EXTERNAL TABLE must be accompanied by LOCATION(line 1, pos 0)
== SQL ==
CREATE EXTERNAL TABLE IF NOT EXISTS new_table (id BIGINT, StartTime TIMESTAMP, EndTime TIMESTAMP) PARTITIONED BY (year INT, month INT, day INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '
^^^
at org.apache.spark.sql.catalyst.parser.ParserUtils$.operationNotAllowed(ParserUtils.scala:39)
at org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateHiveTable.apply(SparkSqlParser.scala:1096)
at org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateHiveTable.apply(SparkSqlParser.scala:1064)
at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99)
at org.apache.spark.sql.execution.SparkSqlAstBuilder.visitCreateHiveTable(SparkSqlParser.scala:1064)
at org.apache.spark.sql.execution.SparkSqlAstBuilder.visitCreateHiveTable(SparkSqlParser.scala:55)
at org.apache.spark.sql.catalyst.parser.SqlBaseParser$CreateHiveTableContext.accept(SqlBaseParser.java:1124)
at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:42)
at org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement.apply(AstBuilder.scala:71)
at org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement.apply(AstBuilder.scala:71)
at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99)
at org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:70)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan.apply(ParseDriver.scala:69)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan.apply(ParseDriver.scala:68)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:97)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:68)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:623)
... 48 elided
默认 SparkSqlParser
(astBuilder
为 SparkSqlAstBuilder
)具有导致异常的 following assertion:
if (external && location.isEmpty) {
operationNotAllowed("CREATE EXTERNAL TABLE must be accompanied by LOCATION", ctx)
我会考虑在 Spark's JIRA if you think that the case should be allowed. See SPARK-2825 中报告一个问题以获得支持:
CREATE EXTERNAL TABLE already works as far as I know and should have the same semantics as Hive.