自连接因 where 子句而失败
Self-join fails with where-clause
我正在使用 apache drill 使用自引用连接在 CSV 文件中对 YTD(年初至今)求和。 (缩短的)查询是
select
... fields from table a ...
a.PeriodAmount,
sum(cast(b.PeriodAmount as dec(18,3))) as YTDAmount
from dfs.`/home/foo/data/a.csv` a
left join dfs.`/home/foo/data/a.csv` b
on
... join-conditions ...
*** where a.Year = '2018' ***
group by
... group-conditions ...
order by
... order-conditions ...
;
查询在 没有 where 子句的情况下有效。当 where 子句在同一数据集上 included 时,我收到以下错误:
Error: UNSUPPORTED_OPERATION ERROR: This query cannot be planned possibly due to either a cartesian join or an inequality join
[Error Id: b62e6b63-eda7-4a52-8f95-2499a1f5c278 on foo:31010] (state=,code=0)
我可以通过删除 where 子句并改为执行子查询来避免错误:
from (select * from dfs.`/home/foo/data/a.csv` where Year = '2017') a
from (select * from dfs.`/home/foo/data/a.csv` where Year = '2017') b
但我不确定这是正确的方法。它使查询更容易出错,因为必须将相同的条件应用于多个子查询,而不是将其作为自然所属的 where 子句。
是否可以重写此自连接以便保留 where 子句?
这是在 ubuntu 16.04 上,在 win10 上使用 WSL,apache drill 是 ver. 1.13.
完成(正在练习)查询:
select
a.Dep_id,
a.Dep,
substr(a.Post_id, 1, 4) as Kap,
a.Post_id,
substr(a.Post_id, 5, 2) as Post,
a.Art_id,
a.Art,
a.V_id,
a.Reg,
a.Dep_V_id,
a.Dep_V,
concat(substr(a.Periode, 1, 4), '-', substr(a.Periode, 5, 2), '-15') as PeriodDate,
a.Period,
a.Year,
a.PeriodAmount,
sum(cast(b.PeriodAmount as dec(18,3))) as YTDAmount
from dfs.`/home/foo/data/a.csv` a
left join dfs.`/home/foo/data/a.csv` b
on
a.Dep_id = b.Dep_id
and a.Post_id = b.Post_id
and a.Post_id is not null
and a.Art_id = b.Art_id
and a.V_id = b.V_id
and a.Reg = b.Reg
and a.Dep_V_id = b.Dep_V_id
and a.Dep_id = b.Dep_id
and b.Period <= a.Period
and a.Year = b.Year
and a.Post_id = b.Post_id
and a.Art_id = b.Art_id
where a.Year in ('2018') and b.Year in (a.Year)
group by
a.Dep_id,
a.Dep,
a.Post_id,
a.Art_id,
a.Art,
a.V_id,
a.Reg,
a.Dep_V_id,
a.Dep_V,
a.Dep_id,
a.Period,
a.Year,
a.PeriodAmount
order by
a.Year,
a.Dep_id,
a.Post_id,
a.Art_id,
a.V_id,
a.Reg,
a.Dep_V_id,
a.Dep_id,
a.Period,
a.PeriodAmount
;
我没有像这样查询过 csv 文件,所以这更像是一个尝试的建议。
像这样完成 a 和 b 的 where 子句以帮助编译器怎么样
WHERE a.Year = ‘2018’ AND b.Year = ‘2018’
或
WHERE a.Year = ‘2018’ AND b.Year = a.Year
我正在使用 apache drill 使用自引用连接在 CSV 文件中对 YTD(年初至今)求和。 (缩短的)查询是
select
... fields from table a ...
a.PeriodAmount,
sum(cast(b.PeriodAmount as dec(18,3))) as YTDAmount
from dfs.`/home/foo/data/a.csv` a
left join dfs.`/home/foo/data/a.csv` b
on
... join-conditions ...
*** where a.Year = '2018' ***
group by
... group-conditions ...
order by
... order-conditions ...
;
查询在 没有 where 子句的情况下有效。当 where 子句在同一数据集上 included 时,我收到以下错误:
Error: UNSUPPORTED_OPERATION ERROR: This query cannot be planned possibly due to either a cartesian join or an inequality join
[Error Id: b62e6b63-eda7-4a52-8f95-2499a1f5c278 on foo:31010] (state=,code=0)
我可以通过删除 where 子句并改为执行子查询来避免错误:
from (select * from dfs.`/home/foo/data/a.csv` where Year = '2017') a
from (select * from dfs.`/home/foo/data/a.csv` where Year = '2017') b
但我不确定这是正确的方法。它使查询更容易出错,因为必须将相同的条件应用于多个子查询,而不是将其作为自然所属的 where 子句。
是否可以重写此自连接以便保留 where 子句?
这是在 ubuntu 16.04 上,在 win10 上使用 WSL,apache drill 是 ver. 1.13.
完成(正在练习)查询:
select
a.Dep_id,
a.Dep,
substr(a.Post_id, 1, 4) as Kap,
a.Post_id,
substr(a.Post_id, 5, 2) as Post,
a.Art_id,
a.Art,
a.V_id,
a.Reg,
a.Dep_V_id,
a.Dep_V,
concat(substr(a.Periode, 1, 4), '-', substr(a.Periode, 5, 2), '-15') as PeriodDate,
a.Period,
a.Year,
a.PeriodAmount,
sum(cast(b.PeriodAmount as dec(18,3))) as YTDAmount
from dfs.`/home/foo/data/a.csv` a
left join dfs.`/home/foo/data/a.csv` b
on
a.Dep_id = b.Dep_id
and a.Post_id = b.Post_id
and a.Post_id is not null
and a.Art_id = b.Art_id
and a.V_id = b.V_id
and a.Reg = b.Reg
and a.Dep_V_id = b.Dep_V_id
and a.Dep_id = b.Dep_id
and b.Period <= a.Period
and a.Year = b.Year
and a.Post_id = b.Post_id
and a.Art_id = b.Art_id
where a.Year in ('2018') and b.Year in (a.Year)
group by
a.Dep_id,
a.Dep,
a.Post_id,
a.Art_id,
a.Art,
a.V_id,
a.Reg,
a.Dep_V_id,
a.Dep_V,
a.Dep_id,
a.Period,
a.Year,
a.PeriodAmount
order by
a.Year,
a.Dep_id,
a.Post_id,
a.Art_id,
a.V_id,
a.Reg,
a.Dep_V_id,
a.Dep_id,
a.Period,
a.PeriodAmount
;
我没有像这样查询过 csv 文件,所以这更像是一个尝试的建议。
像这样完成 a 和 b 的 where 子句以帮助编译器怎么样
WHERE a.Year = ‘2018’ AND b.Year = ‘2018’
或
WHERE a.Year = ‘2018’ AND b.Year = a.Year