Hive 案例导致重复行
Hive Case Resulting Duplicate Row
我有一个 table 包含联系电话和另一个参考 table 包含一个 "length" 变量和一个数字列。
我需要的是找到号码前缀与参考table中的前缀匹配的前缀名称,但它应该是前缀匹配最长的那个。 (天哪,我希望这是有道理的)
到目前为止我尝试过的:
select a.record_type,a.number,b.prefix,b.prefix_name
from first_table a , second_table b
where a.transaction_date=20180924 and case
when b.length=1 then substr(a.number,1,1)=b.prefix
when b.length=2 then substr(a.number,1,2)=b.prefix
when b.length=3 then substr(a.number,1,3)=b.prefix
when b.length=4 then substr(a.number,1,4)=b.prefix
when b.length=5 then substr(a.number,1,5)=b.prefix
when b.length=6 then substr(a.number,1,6)=b.prefix
when b.length=7 then substr(a.number,1,7)=b.prefix
when b.length=8 then substr(a.number,1,8)=b.prefix
when b.length=9 then substr(a.number,1,9)=b.prefix
when b.length=10 then substr(a.number,1,10)=b.prefix
when b.length=11 then substr(a.number,1,11)=b.prefix
when b.length=12 then substr(a.number,1,12)=b.prefix
when b.length=13 then substr(a.number,1,13)=b.prefix
when b.length=14 then substr(a.number,1,14)=b.prefix
end
但是它仍然 returns 重复结果,即:如果数字是 12345,它匹配前缀为 1234 和 123 的引用,而我实际上只想要 1234 一个。
有什么方法可以让案件优先排序?谢谢
两个table中的数据示例:
example
我目前的成绩和期望的成绩:results
好的,我修改了它,试试这个:
WITH FIRST_TABLE (RECORD_TYPE,NUM,TRANSACTION_DATE)AS (
SELECT 'a',12345, DATE '2018-09-24' FROM DUAL
),
SECOND_TABLE (PREFIX,PREFIX_NAME,LENGTH) AS(
SELECT 12,'Type A', 2 FROM DUAL union all
SELECT 1234,'Type B', 4 FROM DUAL
)
select * from (
SELECT A.RECORD_TYPE,A.NUM,B.PREFIX,B.PREFIX_NAME, MAX(B.PREFIX) OVER (PARTITION BY A.RECORD_TYPE,A.NUM) maxPrefix
FROM FIRST_TABLE A ,SECOND_TABLE B
WHERE A.TRANSACTION_DATE=DATE '2018-09-24'
AND A.NUM LIKE (B.PREFIX||'%')
)
where PREFIX=maxPrefix;
您可以使用 row_number()
:
select ap.*
from (select a.record_type, a.number, p.prefix, p.prefix_name,
row_number() over (partition by a.record_type, a.number order by p.length desc) as seqnum
from first_table a join
second_table p
on (p.length = 1 and substr(a.number, 1, 1) = p.prefix) and
(p.length = 2 and substr(a.number, 1, 2) = p.prefix) and
. . .
(p.length = 14 and substr(a.number, 1, 14) = p.prefix)
where a.transaction_date = 20180924
) ap
where seqnum = 1;
这可以更简洁地表述为:
select ap.*
from (select a.record_type, a.number, p.prefix, p.prefix_name,
row_number() over (partition by a.record_type, a.number order by p.length desc) as seqnum
from first_table a join
second_table p
on substr(a.number, 1, p.length) = p.prefix
where a.transaction_date = 20180924
) ap
where seqnum = 1;
另一种方法使用单个 join
进行比较并在第一个匹配项停止:
select a.record_type, a.number,
coalesce(p14.prefix, p13.prefix, . . . , p1.prefix) as prefix,
coalesce(p14.prefix_name, p13.prefix_name, . . . , p1.prefix_name) as prefix_name
from first_table a left join
second_table p14
on p14.length = 14 and substr(a.number, 1, 14) = p14.prefix left join
second_table p13
on p13.length = 13 and substr(a.number, 1, 13) = p13.prefix and p14.prefix is null left join
second_table p12
on p12.length = 12 and substr(a.number, 1, 12) = p12.prefix and p13.prefix is null left join
. . .
second_table p1
on p1.length = 1 and substr(a.number, 1, 1) = p1.prefix and p2.prefix is null
我有一个 table 包含联系电话和另一个参考 table 包含一个 "length" 变量和一个数字列。 我需要的是找到号码前缀与参考table中的前缀匹配的前缀名称,但它应该是前缀匹配最长的那个。 (天哪,我希望这是有道理的)
到目前为止我尝试过的:
select a.record_type,a.number,b.prefix,b.prefix_name
from first_table a , second_table b
where a.transaction_date=20180924 and case
when b.length=1 then substr(a.number,1,1)=b.prefix
when b.length=2 then substr(a.number,1,2)=b.prefix
when b.length=3 then substr(a.number,1,3)=b.prefix
when b.length=4 then substr(a.number,1,4)=b.prefix
when b.length=5 then substr(a.number,1,5)=b.prefix
when b.length=6 then substr(a.number,1,6)=b.prefix
when b.length=7 then substr(a.number,1,7)=b.prefix
when b.length=8 then substr(a.number,1,8)=b.prefix
when b.length=9 then substr(a.number,1,9)=b.prefix
when b.length=10 then substr(a.number,1,10)=b.prefix
when b.length=11 then substr(a.number,1,11)=b.prefix
when b.length=12 then substr(a.number,1,12)=b.prefix
when b.length=13 then substr(a.number,1,13)=b.prefix
when b.length=14 then substr(a.number,1,14)=b.prefix
end
但是它仍然 returns 重复结果,即:如果数字是 12345,它匹配前缀为 1234 和 123 的引用,而我实际上只想要 1234 一个。
有什么方法可以让案件优先排序?谢谢
两个table中的数据示例: example
我目前的成绩和期望的成绩:results
好的,我修改了它,试试这个:
WITH FIRST_TABLE (RECORD_TYPE,NUM,TRANSACTION_DATE)AS (
SELECT 'a',12345, DATE '2018-09-24' FROM DUAL
),
SECOND_TABLE (PREFIX,PREFIX_NAME,LENGTH) AS(
SELECT 12,'Type A', 2 FROM DUAL union all
SELECT 1234,'Type B', 4 FROM DUAL
)
select * from (
SELECT A.RECORD_TYPE,A.NUM,B.PREFIX,B.PREFIX_NAME, MAX(B.PREFIX) OVER (PARTITION BY A.RECORD_TYPE,A.NUM) maxPrefix
FROM FIRST_TABLE A ,SECOND_TABLE B
WHERE A.TRANSACTION_DATE=DATE '2018-09-24'
AND A.NUM LIKE (B.PREFIX||'%')
)
where PREFIX=maxPrefix;
您可以使用 row_number()
:
select ap.*
from (select a.record_type, a.number, p.prefix, p.prefix_name,
row_number() over (partition by a.record_type, a.number order by p.length desc) as seqnum
from first_table a join
second_table p
on (p.length = 1 and substr(a.number, 1, 1) = p.prefix) and
(p.length = 2 and substr(a.number, 1, 2) = p.prefix) and
. . .
(p.length = 14 and substr(a.number, 1, 14) = p.prefix)
where a.transaction_date = 20180924
) ap
where seqnum = 1;
这可以更简洁地表述为:
select ap.*
from (select a.record_type, a.number, p.prefix, p.prefix_name,
row_number() over (partition by a.record_type, a.number order by p.length desc) as seqnum
from first_table a join
second_table p
on substr(a.number, 1, p.length) = p.prefix
where a.transaction_date = 20180924
) ap
where seqnum = 1;
另一种方法使用单个 join
进行比较并在第一个匹配项停止:
select a.record_type, a.number,
coalesce(p14.prefix, p13.prefix, . . . , p1.prefix) as prefix,
coalesce(p14.prefix_name, p13.prefix_name, . . . , p1.prefix_name) as prefix_name
from first_table a left join
second_table p14
on p14.length = 14 and substr(a.number, 1, 14) = p14.prefix left join
second_table p13
on p13.length = 13 and substr(a.number, 1, 13) = p13.prefix and p14.prefix is null left join
second_table p12
on p12.length = 12 and substr(a.number, 1, 12) = p12.prefix and p13.prefix is null left join
. . .
second_table p1
on p1.length = 1 and substr(a.number, 1, 1) = p1.prefix and p2.prefix is null