如何在 apache drill 上使用此查询查看
How to view with this query on apache drill
我需要帮助。
我有这样的数据:
anum bnum
8661994 8661993
8661994 8661993
8661994 8661993
8661992 8661994
在 SQL 我可以做这样的事情:
SELECT
anum,
(
SELECT COUNT(*)
FROM dataku t2
WHERE t2.anum=t1.anum
),
(
SELECT COUNT(*)
FROM dataku t3
WHERE t3.bnum=t1.anum
)
FROM dataku t1
GROUP BY t1.anum;
结果:
anum count_anum count_anum_on_bnum
8661992 1 0
8661994 3 1
我怎样才能在 apache drill 中实现它? (数据在 csv 中)
我试过了,但给了我错误
SELECT
anum,
(
SELECT COUNT(*)
FROM hdfs.`/test/*` as t2
WHERE t2.anum=t1.anum
),
(
SELECT COUNT(*)
FROM hdfs.`/test/*` as t3
WHERE t3.anum=t1.anum
)
FROM hdfs.`/test/*` as t1
GROUP BY t1.anum
LIMIT 1000
错误是:
org.apache.drill.common.exceptions.UserRemoteException:计划错误:无法将 RexNode 转换为等效的 Drill 表达式。 RexNode Class:org.apache.calcite.rex.RexCorrelVariable,RexNode 摘要:$cor1 [错误 ID:master:31010]
上的 7e975eb8-ab37-432f-9387-99126f1f43cf
hdfs 中的 csv 配置
"csv": {
"type": "text",
"extensions": [
"csv"
],
"delimiter": ","
},
我在 Drill 1.13 上试过,发现 NPE 问题。
有几个问题:
这是哪个版本的 Drill?
另外,您能否将用于 "csv" 的配置粘贴到您的 DFS 存储插件中。
例如,我有这个:
"csv": {
"type": "text",
"extensions": [
"csv"
],
"extractHeader": true,
"delimiter": ","
}
将 "extractHeader": true
属性 添加到您的 CSV 格式插件并使用以下查询:
0: jdbc:drill:zk=local> select t1.anum, t1.count_anum, coalesce(t2.count_bnum, 0) as count_anum_on_bnum from
. . . . . . . . . . . > (select anum, count(anum) as `count_anum` from dfs.`/tmp/test.csv` group by anum) t1
. . . . . . . . . . . > left join
. . . . . . . . . . . > (select bnum, count(bnum) as `count_bnum` from dfs.`/tmp/test.csv` group by bnum) t2
. . . . . . . . . . . > on t1.anum = t2.bnum;
+----------+-------------+---------------------+
| anum | count_anum | count_anum_on_bnum |
+----------+-------------+---------------------+
| 8661992 | 1 | 0 |
| 8661994 | 3 | 1 |
+----------+-------------+---------------------+
2 rows selected (0.167 seconds)
Drill 无法规划由提供的查询。您可以提交 Jira 工单来实现它:
https://issues.apache.org/jira/projects/DRILL
我需要帮助。
我有这样的数据:
anum bnum
8661994 8661993
8661994 8661993
8661994 8661993
8661992 8661994
在 SQL 我可以做这样的事情:
SELECT
anum,
(
SELECT COUNT(*)
FROM dataku t2
WHERE t2.anum=t1.anum
),
(
SELECT COUNT(*)
FROM dataku t3
WHERE t3.bnum=t1.anum
)
FROM dataku t1
GROUP BY t1.anum;
结果:
anum count_anum count_anum_on_bnum
8661992 1 0
8661994 3 1
我怎样才能在 apache drill 中实现它? (数据在 csv 中) 我试过了,但给了我错误
SELECT
anum,
(
SELECT COUNT(*)
FROM hdfs.`/test/*` as t2
WHERE t2.anum=t1.anum
),
(
SELECT COUNT(*)
FROM hdfs.`/test/*` as t3
WHERE t3.anum=t1.anum
)
FROM hdfs.`/test/*` as t1
GROUP BY t1.anum
LIMIT 1000
错误是: org.apache.drill.common.exceptions.UserRemoteException:计划错误:无法将 RexNode 转换为等效的 Drill 表达式。 RexNode Class:org.apache.calcite.rex.RexCorrelVariable,RexNode 摘要:$cor1 [错误 ID:master:31010]
上的 7e975eb8-ab37-432f-9387-99126f1f43cfhdfs 中的 csv 配置
"csv": {
"type": "text",
"extensions": [
"csv"
],
"delimiter": ","
},
我在 Drill 1.13 上试过,发现 NPE 问题。 有几个问题: 这是哪个版本的 Drill? 另外,您能否将用于 "csv" 的配置粘贴到您的 DFS 存储插件中。
例如,我有这个:
"csv": {
"type": "text",
"extensions": [
"csv"
],
"extractHeader": true,
"delimiter": ","
}
将 "extractHeader": true
属性 添加到您的 CSV 格式插件并使用以下查询:
0: jdbc:drill:zk=local> select t1.anum, t1.count_anum, coalesce(t2.count_bnum, 0) as count_anum_on_bnum from
. . . . . . . . . . . > (select anum, count(anum) as `count_anum` from dfs.`/tmp/test.csv` group by anum) t1
. . . . . . . . . . . > left join
. . . . . . . . . . . > (select bnum, count(bnum) as `count_bnum` from dfs.`/tmp/test.csv` group by bnum) t2
. . . . . . . . . . . > on t1.anum = t2.bnum;
+----------+-------------+---------------------+
| anum | count_anum | count_anum_on_bnum |
+----------+-------------+---------------------+
| 8661992 | 1 | 0 |
| 8661994 | 3 | 1 |
+----------+-------------+---------------------+
2 rows selected (0.167 seconds)
Drill 无法规划由提供的查询。您可以提交 Jira 工单来实现它: https://issues.apache.org/jira/projects/DRILL