如何在 apache drill 上使用此查询查看

Question

我需要帮助。

我有这样的数据：

anum    bnum
8661994 8661993
8661994 8661993
8661994 8661993
8661992 8661994

在 SQL 我可以做这样的事情：

SELECT
anum,
(
    SELECT COUNT(*)
    FROM dataku t2
    WHERE t2.anum=t1.anum
),
(
    SELECT COUNT(*)
    FROM dataku t3
    WHERE t3.bnum=t1.anum
)
FROM dataku t1
GROUP BY t1.anum;

结果：

anum       count_anum      count_anum_on_bnum
8661992    1                0
8661994    3                1

我怎样才能在 apache drill 中实现它？（数据在 csv 中）我试过了，但给了我错误

SELECT
anum,
(
    SELECT COUNT(*)
    FROM hdfs.`/test/*` as t2
    WHERE t2.anum=t1.anum
),
(
    SELECT COUNT(*)
    FROM hdfs.`/test/*` as t3
    WHERE t3.anum=t1.anum
)
FROM hdfs.`/test/*` as t1
GROUP BY t1.anum
LIMIT 1000

错误是： org.apache.drill.common.exceptions.UserRemoteException：计划错误：无法将 RexNode 转换为等效的 Drill 表达式。 RexNode Class：org.apache.calcite.rex.RexCorrelVariable，RexNode 摘要：$cor1 [错误 ID：master:31010]

上的 7e975eb8-ab37-432f-9387-99126f1f43cf

hdfs 中的 csv 配置

"csv": {
  "type": "text",
  "extensions": [
    "csv"
  ],
  "delimiter": ","
},

Answer 1

我在 Drill 1.13 上试过，发现 NPE 问题。有几个问题：这是哪个版本的 Drill？另外，您能否将用于 "csv" 的配置粘贴到您的 DFS 存储插件中。

例如，我有这个：

 "csv": {
      "type": "text",
      "extensions": [
        "csv"
      ],
      "extractHeader": true,
      "delimiter": ","
    }

Answer 2

将 "extractHeader": true 属性添加到您的 CSV 格式插件并使用以下查询：

0: jdbc:drill:zk=local> select t1.anum, t1.count_anum, coalesce(t2.count_bnum, 0) as count_anum_on_bnum from 
. . . . . . . . . . . > (select anum, count(anum) as `count_anum` from dfs.`/tmp/test.csv` group by anum) t1
. . . . . . . . . . . > left join 
. . . . . . . . . . . > (select bnum, count(bnum) as `count_bnum` from dfs.`/tmp/test.csv` group by bnum) t2
. . . . . . . . . . . > on t1.anum = t2.bnum;
+----------+-------------+---------------------+
|   anum   | count_anum  | count_anum_on_bnum  |
+----------+-------------+---------------------+
| 8661992  | 1           | 0                   |
| 8661994  | 3           | 1                   |
+----------+-------------+---------------------+
2 rows selected (0.167 seconds)

Drill 无法规划由提供的查询。您可以提交 Jira 工单来实现它： https://issues.apache.org/jira/projects/DRILL

如何在 apache drill 上使用此查询查看

How to view with this query on apache drill

sql

hadoop

apache-drill