为什么我在 JOIN 和设置别名后收到 ACCESSING_NON_EXISTENT_FIELD 警告？

Question

在下面的 Pig 脚本中，我的值 ct "disappears" 当我运行在任何步骤 after 执行生成的 DUMP 时设置 e3 别名。例如，如果我在设置别名后立即在 e4 上执行 DUMP，则不会返回任何值。

我还会在输出中看到以下警告：

[main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Encountered Warning ACCESSING_NON_EXISTENT_FIELD 9 time(s).

   eng_grp = GROUP engs BY (aid, scm_id,ts,etype);
   eng_grp_out = FOREACH eng_grp
               GENERATE
                   group.aid as aid,
                   group.scm_id as scm_id,
                   group.etype as etype,
                   group.ts as timestamp,
                   (long)COUNT_STAR(engs) as ct;

   eng_joined = JOIN eng_grp_out BY (aid,scm_id), tgc BY (aid, scm_id);

   e3 = FOREACH eng_joined GENERATE
         MD5((chararray)CONCAT(CONCAT(CONCAT(CONCAT(CONCAT(CONCAT(eng_grp_out::aid,'_'),eng_grp_out::scm_id),'_'),eng_grp_out::etype),'_'),(chararray)eng_grp_out::timestamp)) as id,
         eng_grp_out::aid as v,
         eng_grp_out::scm_id as scmid,
         eng_grp_out::etype AS et,
         eng_grp_out::timestamp as ts,
         FLATTEN(tgc::tags),
         eng_grp_out::ct as ct;

   -- the value for "ct" will be output if I do DUMP e3; here

   e4 = FOREACH e3 GENERATE
         id,
         v,
         scmid,
         et,
         ts,
         FLATTEN(tgc::tags::g) as gg,
         ct;
   -- the value for "ct" will be NOT be output if I do DUMP e4; here
   e5 = FOREACH e4 GENERATE
         id,
         v,
         scmid,
         et,
         ts,
         gg#'g' as tg,
         gg#'v' as tv,
         gg#'d' as td,
         ct;

   e6 = FOREACH e5 GENERATE
         id,
         v,
         scmid,
         et,
         (long)ts,
         tg#'$oid' as tg,
         tv#'$oid' as tv,
         (chararray)td as td,
         ct;

   e7 = FOREACH e6 GENERATE
         id,
         v,
         scmid,
         et,
         ts,
         'c' as tt,
         tg,
         tv,
         td,
         ct;

   e8 = FOREACH e7 GENERATE
         id,v,scmid,et,ts,tt,
         CONCAT(CONCAT(CONCAT(CONCAT(tg,'_'),tv),'_'),td) as ct,
         tg,tv,td,ct;

Answer 1

我终于能够通过将 e3 别名的分配更改为

来让它工作

e3 = FOREACH eng_joined GENERATE //...kept everything else the same... TOMAP('count_val', (long)eng_grp_out::ct);

从那里我可以通过 (long)#'count_val' as val.

获得我在 e4 作业中需要的值

为什么我在 JOIN 和设置别名后收到 ACCESSING_NON_EXISTENT_FIELD 警告？

Why do I receive an ACCESSING_NON_EXISTENT_FIELD warning after a JOIN and setting of an alias?

hadoop

apache-pig