Hive - 重新格式化数据结构

Hive - Reformat data structure

所以我有一个 Hive 数据样本:

Customer xx_var yy_var branchflow
{"customer_no":"239230293892839892","acct":["2324325","23425345"]} 23 3 [{"acctno":"2324325","value":[1,2,3,4,5,6,6,6,4]},{"acctno":"23425345","value":[1,2,3,4,5,6,6,6,99,4]}]

我想把它改造成这样的:

Customer_no acct xx_var yy_var branchflow
239230293892839892 2324325 23 3 [1,2,3,4,5,6,6,6,4]
239230293892839892 23425345 23 3 [1,2,3,4,5,6,6,6,99,4]

我试过使用此查询,但输出格式错误。

SELECT 
    customer.customer_no,
    acct,
    xx_var,
    yy_var,
    bi_acctno,
    values_bi
FROM
    struct_test 
LATERAL VIEW explode(customer.acct) acct AS acctno
LATERAL VIEW explode(brancflow.acctno) bia as bi_acctno
LATERAL VIEW explode(brancflow.value) biv as values_bi
WHERE bi_acctno = acctno

有谁知道如何解决这个问题?

使用json_tuple提取JSON个元素。在数组的情况下,它 returns 它也作为字符串:删除方括号,拆分和分解。请参阅演示代码中的注释。

演示:

with mytable as (--demo data, use your table instead of this CTE
select '{"customer_no":"239230293892839892","acct":["2324325","23425345"]}' as customer,    
       23 xx_var,   3 yy_var,   
       '[{"acctno":"2324325","value":[1,2,3,4,5,6,6,6,4]},{"acctno":"23425345","value":[1,2,3,4,5,6,6,6,99,4]}]' branchflow
)

select c.customer_no, 
       a.acct,  
       t.xx_var,    t.yy_var, 
       get_json_object(b.acct_branchflow,'$.value') value
  from mytable t
       --extract customer_no and acct array
       lateral view json_tuple(t.customer, 'customer_no', 'acct') c as customer_no, accts
       --remove [] and " and explode array of acct
       lateral view explode(split(regexp_replace(c.accts,'^\[|"|\]$',''),',')) a as acct
       --remove [] and explode array of json
       lateral view explode(split(regexp_replace(t.branchflow,'^\[|\]$',''),'(?<=\}),(?=\{)')) b as acct_branchflow
--this will remove duplicates after lateral view: need only matching acct
 where get_json_object(b.acct_branchflow,'$.acctno') = a.acct

结果:

customer_no         acct        xx_var  yy_var  value
239230293892839892  2324325     23      3       [1,2,3,4,5,6,6,6,4]
239230293892839892  23425345    23      3       [1,2,3,4,5,6,6,6,99,4]