Drill "VALIDATION ERROR: A table or view with given name already exists in schema" for empty directory

Drill "VALIDATION ERROR: A table or view with given name already exists in schema" for empty directory

将集群上的 drill 升级到 drill-1.12.0-mapr 后,测试我们的日常 ETL 脚本(它们都使用 drill 将 parquet 文件转换为 tsv),验证错误(“table or view with given name already exists") 在尝试 运行 一些 empty 目录上的 CREATE TABLE 语句时总是抛出一个 writable 工作区。

[Error Id: 6ea46737-8b6a-4887-a671-4bddbea02476 on mapr002.ucera.local:31010]
at org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:489)
at org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:561)
:
:
:
Caused by: org.apache.drill.common.exceptions.UserRemoteException: VALIDATION ERROR: A table or view with given name [/internal_etl/project/version-2/stages/storage/ACCOUNT/tsv] already exists in schema [dfs.etl_internal]

经过一些简短的调试后,我看到指定的 dfs.etl_interal 工作区(即 /internal_etl/project/version-2/stages/storage/ACCOUNT/tsv)下的相关 FS 目录 实际上是空的 ,但仍然抛出这些错误。

在上面错误信息的关联节点中的drillbit.log文件中查找错误ID,我们看到

2018-12-04 10:13:25,285 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman] INFO  o.a.drill.exec.work.foreman.Foreman - Query text for query id 23f92019-db56-862f-e7b9-cd51b3e174ae: create table dfs.etl_internal.`/internal_etl/project/version-2/stages/storage/ACCOUNT/tsv` as 
select <a bunch of fields>
from dfs.etl_internal.`/internal_etl/project/version-2/stages/storage/ACCOUNT/parquet`
2018-12-04 10:13:25,406 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman] INFO  o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, numFiles: 1
2018-12-04 10:13:25,408 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman] INFO  o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, numFiles: 1
2018-12-04 10:13:25,893 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman] INFO  o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, numFiles: 1
2018-12-04 10:13:25,894 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman] INFO  o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, numFiles: 1
2018-12-04 10:13:25,898 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman] INFO  o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, numFiles: 1
2018-12-04 10:13:25,898 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman] INFO  o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, numFiles: 1
2018-12-04 10:13:25,905 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman] INFO  o.a.d.e.p.s.h.CreateTableHandler - User Error Occurred: A table or view with given name [/internal_etl/project/version-2/stages/storage/ACCOUNT/tsv] already exists in schema [dfs.etl_internal]
org.apache.drill.common.exceptions.UserException: VALIDATION ERROR: A table or view with given name [/internal_etl/project/version-2/stages/storage/ACCOUNT/tsv] already exists in schema [dfs.etl_internal]


[Error Id: 45177abc-7e9f-4678-959f-f9e0e38bc564 ]
    at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:586) ~[drill-common-1.12.0-mapr.jar:1.12.0-mapr]
    at org.apache.drill.exec.planner.sql.handlers.CreateTableHandler.checkTableCreationPossibility(CreateTableHandler.java:326) [drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
    at org.apache.drill.exec.planner.sql.handlers.CreateTableHandler.getPlan(CreateTableHandler.java:90) [drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
    at org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:131) [drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
    at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:79) [drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
    at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:567) [drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
    at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:264) [drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_151]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_151]
    at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151]
2018-12-04 10:13:25,924 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman] INFO  o.apache.drill.exec.work.WorkManager - Waiting for 0 queries to complete before shutting down
2018-12-04 10:13:25,924 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman] INFO  o.apache.drill.exec.work.WorkManager - Waiting for 0 running fragments to complete before shutting down

即使在 CREATE TABLE 语句之前使用 DROP TABLE [IF EXISTS] <workspace>.<table path name> 也会出现此错误。此外,在升级到 drill-1.12 之前,dfs 工作区本身的配置似乎没有改变,见下文:

:
:
"workspaces": {
"root": {
"location": "/",
"writable": false,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
},
"tmp": {
"location": "/tmp",
"writable": true,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
},
"etl_internal": {
"location": "/etl/internal",
"writable": true,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
}
},
:
:

请注意,所讨论的完整过程旨在每天 mv 目录内容,并 CREATE TABLE 使用当天的新数据(以防万一)并且此过程已当我们使用 drill-1.11 时工作正常。

更多调试信息:

在CREATE TABLE 语句期间,简单地删除.../tsv 端点文件夹并依靠drill 创建目录是行不通的。抛出意料之中的错误

Error: VALIDATION ERROR: Table [/internal_etl/project/version-2/stages/storage/ACCOUNT/tsv] not found
[Error Id: 02e7c088-9162-4731-9fa8-85dfd39e1dec on mapr001.ucera.local:31010] (state=,code=0)

即。 drill 似乎不会自动创建 table。 撤消这些更改并重新运行 得到原始错误,我们可以通过 sqlline 解释器接口检查位置。这样做,我们看到

0: jdbc:drill:zk=mapr001:5181,mapr002:5181,ma> describe dfs.etl_internal.`/internal_etl/project/version-2/stages/storage/ACCOUNT/tsv`;
+--------------+------------+--------------+
| COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
+--------------+------------+--------------+
+--------------+------------+--------------+
No rows selected (1.791 seconds)

所以它在那里看到了一些东西,但只有当我自己做的时候,这就像一个第 22 条军规,因为最初的错误是 抱怨那个东西已经存在.

如果有更多使用 drill 经验的人知道这里会发生什么,任何意见或建议将不胜感激。

TLDR:重新启动节点上的钻头,现在一切似乎都在工作。

为了钻取 运行 CTAS 语句而没有错误,我们做了什么:

  1. 从 MapR MCS 重新启动钻取服务。这是完全基于 由于遇到 hanging-drill-1.11-processes 问题,预感 早些时候从 drill-1.11 升级到 drill-1.12 后,在需要手动转到每个节点时遇到问题,jps 看到 drillbit 1.11 仍然是 运行ning,并且 kill -9 <pid of 1.11 drillbit>,并重新启动钻头以使 1.12 工作。不确定这有多大帮助, 但记录下来,因为这是在这个过程中所做的唯一改变 运行ning 之前未撤消的调试 最终似乎已经解决的变化 错误。
  2. 更改了钻取脚本以删除 CTAS 语句的目标文件夹 (hadoop fs -rm -r /hdfs/path/to/folder) 运行 在其上执行一些必要的过程,然后让 CTAS 语句自行重新创建它(尽管正如之前在原始 post 中提到的那样,早些时候尝试过并在奇怪的 catch-22 情况下收到 "Table not found" 错误(因此我认为重新启动钻取服务可能有所贡献))。

我知道仅仅重新启动服务可能不是最好的最有用的答案,但这似乎在这里起作用。如果有人根据上述解决方案描述有任何更多信息或想法要添加,请发表评论。

看来您在更新 MapR 集群上的 Drill 版本的过程中犯了一些错误。

有关详细信息,请参阅此文档:http://doc.mapr.com/display/MapR/Upgrading+to+the+Latest+Version+of+Drill
或最后的文档,以防您使用的是最新的 MapR Core 版本: https://mapr.com/docs/home/UpgradeGuide/PreupgradeStepsDrill.html?hl=drill%2Cupgrade
https://mapr.com/docs/home/UpgradeGuide/PostUpgradeStepsDrill.html?hl=drill%2Cupgrade

DROP TABLE 对于 Drill 无模式表工作正常。查看有关 Drill 无模式表(空目录)的更多信息:
https://drill.apache.org/docs/data-sources-and-file-formats-introduction/#schemaless-tables