Pig - 在蜂巢中存储复杂的关系模式 table

Pig - Store a complex relation schema in a hive table

这是我今天的交易。好吧,在从配置单元读取关系后,我创建了一个关系作为几个转换的结果。问题是我想在 Hive 中进行几次分析后存储最终关系,但我不能。让我的代码更清楚地看到这一点。

第一个字符串是当我从 Hive 加载并转换我的结果时:

july = LOAD 'POC.july' USING org.apache.hive.hcatalog.pig.HCatLoader ;  
july_cl = FOREACH july GENERATE GetDay(ToDate(start_date)) as day:int,start_station,duration; jul_cl_fl = FILTER july_cl BY day==31; 
july_gr = GROUP jul_cl_fl BY (day,start_station); 
july_result = FOREACH july_gr { 
           total_dura = SUM(jul_cl_fl.duration); 
           avg_dura = AVG(jul_cl_fl.duration); 
           qty_trips = COUNT(jul_cl_fl); 
           GENERATE FLATTEN(group),total_dura,avg_dura,qty_trips;
 };

所以,现在当我尝试存储关系时 july_result 我不能,因为模式已经改变,我想它与 Hive 不兼容:

存储 july_result 进入 'poc.july_analysis' 使用 org.apache.hive.hcatalog.pig.HCatStorer ();

即使我尝试为最终关系设置一个特殊的方案我也没有想出来。

july_result = FOREACH july_gr {
              total_dura = SUM(jul_cl_fl.duration);
              avg_dura = AVG(jul_cl_fl.duration);
              qty_trips = COUNT(jul_cl_fl);
              GENERATE FLATTEN(group) as (day:int),total_dura as (total_dura:int),avg_dura as (avg_dura:int),qty_trips as (qty_trips:int);
              };

经过hortonworks社区的研究,我得到了关于如何在pig中为组关系定义输出格式的解决方案。我的新代码如下所示:

july_result = FOREACH july_gr {
              total_dura = SUM(jul_cl_fl.duration);
              avg_dura = AVG(jul_cl_fl.duration);
              qty_trips = COUNT(jul_cl_fl);
              GENERATE FLATTEN( group) AS (day, code_station),(int)total_dura as (total_dura:int),(float)avg_dura as (avg_dura:float),(int)qty_trips as (qty_trips:int);
              };

谢谢大家。