如何删除 posexplode 配置单元上的重复项
how to remove duplicates on posexplode hive
我正在做姿势爆炸,
如何删除因解析列而导致的重复行?
我不能做 distinct 因为有几个列(在解析之前)为空。
posexplode 之前的示例:
id | cofe |
AAA | |||-9000| 4 |
BBB | 5|90 |
CCC | |
DDD | 6||||| |
EEE | |
不幸的是,结果
id | cofe
AAA |
AAA |
AAA | -9000
AAA | 4
BBB | 5
BBB | 90
CCC |
DDD | 6
DDD |
DDD |
DDD |
DDD |
EEE |
预期结果
id | cofe
AAA | -9000
AAA | 4
BBB | 5
BBB | 90
CCC |
DDD | 6
EEE |
SELECT qq.id,
ss.cofe,
ss.fnte,
ss.cnte
from
(
select id,
sequence,
split (BMWA, '~')[15] AS CFEEA
split (BMWA, '~')[16] AS FTAAA
split (BMWA, '~')[17] AS CNTTA
FROM
(
select id,
sequence,
replace(bmw, '^','~') AS BMWA
from tablee
)rr
)qq
lateral view posexplode(split(replace(qq.CFEEA'|','~'),'~')) ss as r, cofe,
lateral view posexplode(split(replace(qq.FTAAA'|','~'),'~')) ss as r, fnte
lateral view posexplode(split(replace(qq.CNTTA'|','~'),'~')) ss as r, cnte
任何想法将不胜感激!
如果要在拆分字符串时跳过空元素,请在拆分前将连续的分隔符替换为单个分隔符,同时删除开头和结尾的分隔符。
例如 '|||-9000| 4'
(以竖线分隔)
select split(
regexp_replace(
--replace consecutive 2+ delimiters with single one
regexp_replace ('|||-9000| 4','\|{2,}','|'), --gives '|-9000| 4'
--remove start and end delimiter
'^\||\|$',''), --gives '-9000| 4'
--split
'\|') --gives array ["-9000"," 4"]
您的数据示例:
with mytable as (
select stack (5,
'AAA','|||-9000| 4',
'BBB',' 5|90',
'CCC','',
'DDD','6|||||',
'EEE',''
) as (id,cofe )
)
select id, e.val as cofe
from mytable
lateral view outer posexplode(
split(
regexp_replace(
regexp_replace (cofe,'\|{2,}','|'),
'^\||\|$',''),
'\|')
) e as pos, val
结果:
id cofe
AAA -9000
AAA 4
BBB 5
BBB 90
CCC
DDD 6
EEE
此外,多个 LATERAL VIEW posexplode 可以为每一行生成爆炸值的笛卡尔积。请参阅有关如何按位置分解多个不同长度数组的答案
我正在做姿势爆炸, 如何删除因解析列而导致的重复行? 我不能做 distinct 因为有几个列(在解析之前)为空。
posexplode 之前的示例:
id | cofe |
AAA | |||-9000| 4 |
BBB | 5|90 |
CCC | |
DDD | 6||||| |
EEE | |
不幸的是,结果
id | cofe
AAA |
AAA |
AAA | -9000
AAA | 4
BBB | 5
BBB | 90
CCC |
DDD | 6
DDD |
DDD |
DDD |
DDD |
EEE |
预期结果
id | cofe
AAA | -9000
AAA | 4
BBB | 5
BBB | 90
CCC |
DDD | 6
EEE |
SELECT qq.id,
ss.cofe,
ss.fnte,
ss.cnte
from
(
select id,
sequence,
split (BMWA, '~')[15] AS CFEEA
split (BMWA, '~')[16] AS FTAAA
split (BMWA, '~')[17] AS CNTTA
FROM
(
select id,
sequence,
replace(bmw, '^','~') AS BMWA
from tablee
)rr
)qq
lateral view posexplode(split(replace(qq.CFEEA'|','~'),'~')) ss as r, cofe,
lateral view posexplode(split(replace(qq.FTAAA'|','~'),'~')) ss as r, fnte
lateral view posexplode(split(replace(qq.CNTTA'|','~'),'~')) ss as r, cnte
任何想法将不胜感激!
如果要在拆分字符串时跳过空元素,请在拆分前将连续的分隔符替换为单个分隔符,同时删除开头和结尾的分隔符。
例如 '|||-9000| 4'
(以竖线分隔)
select split(
regexp_replace(
--replace consecutive 2+ delimiters with single one
regexp_replace ('|||-9000| 4','\|{2,}','|'), --gives '|-9000| 4'
--remove start and end delimiter
'^\||\|$',''), --gives '-9000| 4'
--split
'\|') --gives array ["-9000"," 4"]
您的数据示例:
with mytable as (
select stack (5,
'AAA','|||-9000| 4',
'BBB',' 5|90',
'CCC','',
'DDD','6|||||',
'EEE',''
) as (id,cofe )
)
select id, e.val as cofe
from mytable
lateral view outer posexplode(
split(
regexp_replace(
regexp_replace (cofe,'\|{2,}','|'),
'^\||\|$',''),
'\|')
) e as pos, val
结果:
id cofe
AAA -9000
AAA 4
BBB 5
BBB 90
CCC
DDD 6
EEE
此外,多个 LATERAL VIEW posexplode 可以为每一行生成爆炸值的笛卡尔积。请参阅有关如何按位置分解多个不同长度数组的答案