在 Apache Drill 中 JDBC 扫描时,列名未传递给 PostgreSQL
Column name is not passed to PostgreSQL on JDBC Scan in Apache Drill
在尝试 运行 SQL 查询 PostgreSQL 时,而不是 table 中的列名引用它,它向下推送 * 到数据库。
select
m.m_id,
cnt_c_no
from (
select
m_id
from pg_test_main.test1.table1
where
last_date >= '2019-01-01 00:00:00'
) as m
left join (
select
ci.m_id,
count(ci.c_no) as cnt_c_no
from (
select
m_id,
c_no
from pg_test.public.table2
) as ci
inner join (
select
c_no
from pg_test.public.table3
where
is_del = 'F'
) as c on ci.c_no = c.c_no
group by
ci.m_id
) as join1 on m.m_id = join1.m_id;
00-00 Screen
00-01 Project(m_id=[[=12=]], cnt_c_no=[])
00-02 Project(m_id=[[=12=]], cnt_c_no=[])
00-03 HashJoin(condition=[=([=12=], )], joinType=[left], semi-join: =[false])
00-05 Jdbc(sql=[SELECT "m_id" FROM "test1"."table1" WHERE "last_date" >= '2019-01-01 00:00:00' ])
00-04 Project(m_id0=[[=12=]], cnt_c_no=[])
00-06 HashAgg(group=[{0}], cnt_c_no=[COUNT()])
00-07 Project(m_id=[[=12=]], c_no=[])
00-08 HashJoin(condition=[=(, )], joinType=[inner], semi-join: =[false])
00-10 Project(m_id=[], c_no=[])
00-12 Jdbc(sql=[SELECT * FROM "public"."table2" ])
00-09 Project(c_no0=[[=12=]])
00-11 Project(c_no=[[=12=]])
00-13 SelectionVectorRemover
00-14 Filter(condition=[=(, 'F')])
00-15 Jdbc(sql=[SELECT * FROM "public"."table3" ])
如您所见,Jdbc table1 扫描使用的是列名称。
但是,Jdbc 扫描 table2 和 table3 没有使用列名。它将 * 推送到数据库。
如何控制 jdbc 扫描以便它可以下推列名称?
Apache Drill 版本为 1.16.0(嵌入式模式)
我试图在 Drill 1.17 和 Drill 1.15 上用 MySQL 重现它,但是对于查询,类似于您指定的查询,所有查询都被推送到 JDBC 存储中:
SELECT m.person_id,
cnt_c_no
FROM
(SELECT person_id
FROM mysql.`drill_mysql_test1`.person1
WHERE date_field >= '2019-01-01 00:00:00') AS m
LEFT JOIN
(SELECT ci.person_id,
count(ci.last_name) AS cnt_c_no
FROM
(SELECT person_id,
last_name
FROM mysql.`drill_mysql_test`.person) AS ci
INNER JOIN
(SELECT last_name
FROM mysql.`drill_mysql_test`.person2
WHERE boolean_field = 'F' ) AS c ON ci.last_name = c.last_name
GROUP BY ci.person_id) AS join1 ON m.person_id = join1.person_id
此查询的计划:
00-00 Screen
00-01 Project(person_id=[[=11=]], cnt_c_no=[])
00-02 Jdbc(sql=[SELECT `t0`.`person_id`, `t5`.`cnt_c_no` FROM (SELECT `person_id` FROM `drill_mysql_test1`.`person1` WHERE `date_field` >= '2019-01-01 00:00:00') AS `t0` LEFT JOIN (SELECT `t1`.`person_id`, COUNT(`t1`.`last_name`) AS `cnt_c_no` FROM (SELECT `person_id`, `last_name` FROM `drill_mysql_test`.`person`) AS `t1` INNER JOIN (SELECT `last_name` FROM `drill_mysql_test`.`person2` WHERE `boolean_field` = 'F') AS `t3` ON `t1`.`last_name` = `t3`.`last_name` GROUP BY `t1`.`person_id`) AS `t5` ON `t0`.`person_id` = `t5`.`person_id` ])
能否请您为 Postgres 表提供 CTAS,因此我将尝试使用特定数据类型再次重现它。另外,如果可能,请检查此问题是否仍然在 Drill 1.17 上重现。
更新:
在此答案下评论有助于发现此问题是由以下问题引起的:https://issues.apache.org/jira/browse/DRILL-7340 并将在 Drill 1.18.0 中解决。
在尝试 运行 SQL 查询 PostgreSQL 时,而不是 table 中的列名引用它,它向下推送 * 到数据库。
select
m.m_id,
cnt_c_no
from (
select
m_id
from pg_test_main.test1.table1
where
last_date >= '2019-01-01 00:00:00'
) as m
left join (
select
ci.m_id,
count(ci.c_no) as cnt_c_no
from (
select
m_id,
c_no
from pg_test.public.table2
) as ci
inner join (
select
c_no
from pg_test.public.table3
where
is_del = 'F'
) as c on ci.c_no = c.c_no
group by
ci.m_id
) as join1 on m.m_id = join1.m_id;
00-00 Screen
00-01 Project(m_id=[[=12=]], cnt_c_no=[])
00-02 Project(m_id=[[=12=]], cnt_c_no=[])
00-03 HashJoin(condition=[=([=12=], )], joinType=[left], semi-join: =[false])
00-05 Jdbc(sql=[SELECT "m_id" FROM "test1"."table1" WHERE "last_date" >= '2019-01-01 00:00:00' ])
00-04 Project(m_id0=[[=12=]], cnt_c_no=[])
00-06 HashAgg(group=[{0}], cnt_c_no=[COUNT()])
00-07 Project(m_id=[[=12=]], c_no=[])
00-08 HashJoin(condition=[=(, )], joinType=[inner], semi-join: =[false])
00-10 Project(m_id=[], c_no=[])
00-12 Jdbc(sql=[SELECT * FROM "public"."table2" ])
00-09 Project(c_no0=[[=12=]])
00-11 Project(c_no=[[=12=]])
00-13 SelectionVectorRemover
00-14 Filter(condition=[=(, 'F')])
00-15 Jdbc(sql=[SELECT * FROM "public"."table3" ])
如您所见,Jdbc table1 扫描使用的是列名称。
但是,Jdbc 扫描 table2 和 table3 没有使用列名。它将 * 推送到数据库。
如何控制 jdbc 扫描以便它可以下推列名称?
Apache Drill 版本为 1.16.0(嵌入式模式)
我试图在 Drill 1.17 和 Drill 1.15 上用 MySQL 重现它,但是对于查询,类似于您指定的查询,所有查询都被推送到 JDBC 存储中:
SELECT m.person_id,
cnt_c_no
FROM
(SELECT person_id
FROM mysql.`drill_mysql_test1`.person1
WHERE date_field >= '2019-01-01 00:00:00') AS m
LEFT JOIN
(SELECT ci.person_id,
count(ci.last_name) AS cnt_c_no
FROM
(SELECT person_id,
last_name
FROM mysql.`drill_mysql_test`.person) AS ci
INNER JOIN
(SELECT last_name
FROM mysql.`drill_mysql_test`.person2
WHERE boolean_field = 'F' ) AS c ON ci.last_name = c.last_name
GROUP BY ci.person_id) AS join1 ON m.person_id = join1.person_id
此查询的计划:
00-00 Screen
00-01 Project(person_id=[[=11=]], cnt_c_no=[])
00-02 Jdbc(sql=[SELECT `t0`.`person_id`, `t5`.`cnt_c_no` FROM (SELECT `person_id` FROM `drill_mysql_test1`.`person1` WHERE `date_field` >= '2019-01-01 00:00:00') AS `t0` LEFT JOIN (SELECT `t1`.`person_id`, COUNT(`t1`.`last_name`) AS `cnt_c_no` FROM (SELECT `person_id`, `last_name` FROM `drill_mysql_test`.`person`) AS `t1` INNER JOIN (SELECT `last_name` FROM `drill_mysql_test`.`person2` WHERE `boolean_field` = 'F') AS `t3` ON `t1`.`last_name` = `t3`.`last_name` GROUP BY `t1`.`person_id`) AS `t5` ON `t0`.`person_id` = `t5`.`person_id` ])
能否请您为 Postgres 表提供 CTAS,因此我将尝试使用特定数据类型再次重现它。另外,如果可能,请检查此问题是否仍然在 Drill 1.17 上重现。
更新: 在此答案下评论有助于发现此问题是由以下问题引起的:https://issues.apache.org/jira/browse/DRILL-7340 并将在 Drill 1.18.0 中解决。