多个横向视图和蜂巢中的爆炸产生行重复
Multiple lateral view and explode in hive produce rows duplication
我正在使用 4 列
Ref_No
Currency
Amount
Tag
EBDR001
usd^usd^usd^usd^
240^300^210^500^
DBC^ODA^ICA^DRA
我想要这种格式的数据
Ref_No
Currency
Amount
Tag
EBDR001
usd
240
DBC
EBDR001
usd
300
ODA
EBDR001
usd
210
ICA
EBDR001
usd
500
DRA
我得到的结果
Ref_No
Currency
Amount
Tag
EBDR001
usd
240
DBC
EBDR001
usd
240
DBC
EBDR001
usd
240
DBC
EBDR001
usd
240
DBC
EBDR001
usd
300
ODA
EBDR001
usd
300
ODA
EBDR001
usd
300
ODA
EBDR001
usd
300
ODA
EBDR001
usd
210
ICA
EBDR001
usd
210
ICA
EBDR001
usd
210
ICA
EBDR001
usd
210
ICA
EBDR001
usd
500
DRA
EBDR001
usd
500
DRA
EBDR001
usd
500
DRA
EBDR001
usd
500
DRA
像这样有超过一千行不同的 Ref_No。
我使用的查询是
select Ref,ccy,amt,tag_1 from table1
lateral view explode(split(ccy,"\^")) myTable12 as ccy
lateral view explode(split(amt,"\^")) myTable13 as amt
lateral view explode(split(tag_1 ,"\^")) myTable14 as tag_1
多侧视图产生笛卡尔积解决方案是使用单侧视图posexplode,拆分其他列得到数组,然后使用position寻址其他值
有些列末尾包含额外的 ^
分隔符,拆分前将其删除。
演示:
with table1 as (--Data example, use your table instead
select 'EBDR001' Ref_No, 'usd^usd^usd^usd^' Currency, '240^300^210^500^' Amount, 'DBC^ODA^ICA^DRA' Tag
)
select Ref_No,
c.ccy,
split(t1.Amount,"\^")[c.pos] amt,
split(t1.Tag,"\^")[c.pos] tag
from ( --Remove extra delimiter at the end
select Ref_No,
regexp_replace(Currency,'\^$','') Currency,
regexp_replace(Amount,'\^$','') Amount,
regexp_replace(Tag,'\^$','') Tag
from table1
) t1
lateral view posexplode(split(t1.Currency,"\^")) c as pos, ccy
结果:
ref_no c.ccy amt tag
EBDR001 usd 240 DBC
EBDR001 usd 300 ODA
EBDR001 usd 210 ICA
EBDR001 usd 500 DRA
可能的选择:
三个侧视图可以用posexplode代替explode,WHERE子句匹配三个侧视图的位置,见
您也可以使用三个 CTE,在其中对每个数组进行 posexplode,然后通过 LEFT JOIN 它们与 main table ON Ref 和 Pos,类似的解决方案在这里:.
我正在使用 4 列
Ref_No | Currency | Amount | Tag |
---|---|---|---|
EBDR001 | usd^usd^usd^usd^ | 240^300^210^500^ | DBC^ODA^ICA^DRA |
我想要这种格式的数据
Ref_No | Currency | Amount | Tag |
---|---|---|---|
EBDR001 | usd | 240 | DBC |
EBDR001 | usd | 300 | ODA |
EBDR001 | usd | 210 | ICA |
EBDR001 | usd | 500 | DRA |
我得到的结果
Ref_No | Currency | Amount | Tag |
---|---|---|---|
EBDR001 | usd | 240 | DBC |
EBDR001 | usd | 240 | DBC |
EBDR001 | usd | 240 | DBC |
EBDR001 | usd | 240 | DBC |
EBDR001 | usd | 300 | ODA |
EBDR001 | usd | 300 | ODA |
EBDR001 | usd | 300 | ODA |
EBDR001 | usd | 300 | ODA |
EBDR001 | usd | 210 | ICA |
EBDR001 | usd | 210 | ICA |
EBDR001 | usd | 210 | ICA |
EBDR001 | usd | 210 | ICA |
EBDR001 | usd | 500 | DRA |
EBDR001 | usd | 500 | DRA |
EBDR001 | usd | 500 | DRA |
EBDR001 | usd | 500 | DRA |
像这样有超过一千行不同的 Ref_No。
我使用的查询是
select Ref,ccy,amt,tag_1 from table1
lateral view explode(split(ccy,"\^")) myTable12 as ccy
lateral view explode(split(amt,"\^")) myTable13 as amt
lateral view explode(split(tag_1 ,"\^")) myTable14 as tag_1
多侧视图产生笛卡尔积解决方案是使用单侧视图posexplode,拆分其他列得到数组,然后使用position寻址其他值
有些列末尾包含额外的 ^
分隔符,拆分前将其删除。
演示:
with table1 as (--Data example, use your table instead
select 'EBDR001' Ref_No, 'usd^usd^usd^usd^' Currency, '240^300^210^500^' Amount, 'DBC^ODA^ICA^DRA' Tag
)
select Ref_No,
c.ccy,
split(t1.Amount,"\^")[c.pos] amt,
split(t1.Tag,"\^")[c.pos] tag
from ( --Remove extra delimiter at the end
select Ref_No,
regexp_replace(Currency,'\^$','') Currency,
regexp_replace(Amount,'\^$','') Amount,
regexp_replace(Tag,'\^$','') Tag
from table1
) t1
lateral view posexplode(split(t1.Currency,"\^")) c as pos, ccy
结果:
ref_no c.ccy amt tag
EBDR001 usd 240 DBC
EBDR001 usd 300 ODA
EBDR001 usd 210 ICA
EBDR001 usd 500 DRA
可能的选择:
三个侧视图可以用posexplode代替explode,WHERE子句匹配三个侧视图的位置,见
您也可以使用三个 CTE,在其中对每个数组进行 posexplode,然后通过 LEFT JOIN 它们与 main table ON Ref 和 Pos,类似的解决方案在这里: