将 Apache Pig 转换为 Hive

Converting apache pig to hive

试图找出 "group" 扁平化以及这个特定的 "flatten" 代码在做什么。我一直在研究下面的代码,试图找出如何将它转换为蜂巢几天断断续续,但我就是不明白。通常,他们使用 flatten 为他们希望在输出中命名为相同的两列或多列创建多行。但在这种情况下,我不确定在蜂巢中复制它做了什么。任何帮助将不胜感激,因为我没有太多时间来处理这个问题,而我预计会在接下来的几周内完成并测试它。谢谢

Change_pop = GROUP IPChange_pop BY (acct_num,strategy_code);
Oldest_GLChange = FOREACH Change_pop {
OList = ORDER IPChange_pop BY process_date ASC, new_loc DESC;
Oldest = LIMIT OList 1;
GENERATE
FLATTEN(GLChange_pop) as (email,acct_num,acct_nm,cust_num,type,strategy_code,process_date,last_5,cmGroup,current_loc,new_loc,update_ts),
FLATTEN(group.strategy_code) as grp_strategy_code,
FLATTEN(Oldest.process_date) as early_process_date, FLATTEN(Oldest.new_loc) as early_new_loc;
};

Flatten 用于取消嵌套元组、包和映射。从我的脑海中,我记得 Hive 等效项将使用 EXPLODE() 函数以及 LATERAL VIEW。

https://pig.apache.org/docs/latest/basic.html#flatten

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-explode