为什么我在 JOIN 和设置别名后收到 ACCESSING_NON_EXISTENT_FIELD 警告?
Why do I receive an ACCESSING_NON_EXISTENT_FIELD warning after a JOIN and setting of an alias?
在下面的 Pig 脚本中,我的值 ct
"disappears" 当我 运行 在任何步骤 after 执行生成的 DUMP 时设置 e3
别名。例如,如果我在设置别名后立即在 e4
上执行 DUMP
,则不会返回任何值。
我还会在输出中看到以下警告:
[main] WARN
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Encountered Warning ACCESSING_NON_EXISTENT_FIELD 9 time(s).
eng_grp = GROUP engs BY (aid, scm_id,ts,etype);
eng_grp_out = FOREACH eng_grp
GENERATE
group.aid as aid,
group.scm_id as scm_id,
group.etype as etype,
group.ts as timestamp,
(long)COUNT_STAR(engs) as ct;
eng_joined = JOIN eng_grp_out BY (aid,scm_id), tgc BY (aid, scm_id);
e3 = FOREACH eng_joined GENERATE
MD5((chararray)CONCAT(CONCAT(CONCAT(CONCAT(CONCAT(CONCAT(eng_grp_out::aid,'_'),eng_grp_out::scm_id),'_'),eng_grp_out::etype),'_'),(chararray)eng_grp_out::timestamp)) as id,
eng_grp_out::aid as v,
eng_grp_out::scm_id as scmid,
eng_grp_out::etype AS et,
eng_grp_out::timestamp as ts,
FLATTEN(tgc::tags),
eng_grp_out::ct as ct;
-- the value for "ct" will be output if I do DUMP e3; here
e4 = FOREACH e3 GENERATE
id,
v,
scmid,
et,
ts,
FLATTEN(tgc::tags::g) as gg,
ct;
-- the value for "ct" will be NOT be output if I do DUMP e4; here
e5 = FOREACH e4 GENERATE
id,
v,
scmid,
et,
ts,
gg#'g' as tg,
gg#'v' as tv,
gg#'d' as td,
ct;
e6 = FOREACH e5 GENERATE
id,
v,
scmid,
et,
(long)ts,
tg#'$oid' as tg,
tv#'$oid' as tv,
(chararray)td as td,
ct;
e7 = FOREACH e6 GENERATE
id,
v,
scmid,
et,
ts,
'c' as tt,
tg,
tv,
td,
ct;
e8 = FOREACH e7 GENERATE
id,v,scmid,et,ts,tt,
CONCAT(CONCAT(CONCAT(CONCAT(tg,'_'),tv),'_'),td) as ct,
tg,tv,td,ct;
我终于能够通过将 e3
别名的分配更改为
来让它工作
e3 = FOREACH eng_joined GENERATE
//...kept everything else the same...
TOMAP('count_val', (long)eng_grp_out::ct);
从那里我可以通过 (long)#'count_val' as val
.
获得我在 e4
作业中需要的值
在下面的 Pig 脚本中,我的值 ct
"disappears" 当我 运行 在任何步骤 after 执行生成的 DUMP 时设置 e3
别名。例如,如果我在设置别名后立即在 e4
上执行 DUMP
,则不会返回任何值。
我还会在输出中看到以下警告:
[main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Encountered Warning ACCESSING_NON_EXISTENT_FIELD 9 time(s).
eng_grp = GROUP engs BY (aid, scm_id,ts,etype);
eng_grp_out = FOREACH eng_grp
GENERATE
group.aid as aid,
group.scm_id as scm_id,
group.etype as etype,
group.ts as timestamp,
(long)COUNT_STAR(engs) as ct;
eng_joined = JOIN eng_grp_out BY (aid,scm_id), tgc BY (aid, scm_id);
e3 = FOREACH eng_joined GENERATE
MD5((chararray)CONCAT(CONCAT(CONCAT(CONCAT(CONCAT(CONCAT(eng_grp_out::aid,'_'),eng_grp_out::scm_id),'_'),eng_grp_out::etype),'_'),(chararray)eng_grp_out::timestamp)) as id,
eng_grp_out::aid as v,
eng_grp_out::scm_id as scmid,
eng_grp_out::etype AS et,
eng_grp_out::timestamp as ts,
FLATTEN(tgc::tags),
eng_grp_out::ct as ct;
-- the value for "ct" will be output if I do DUMP e3; here
e4 = FOREACH e3 GENERATE
id,
v,
scmid,
et,
ts,
FLATTEN(tgc::tags::g) as gg,
ct;
-- the value for "ct" will be NOT be output if I do DUMP e4; here
e5 = FOREACH e4 GENERATE
id,
v,
scmid,
et,
ts,
gg#'g' as tg,
gg#'v' as tv,
gg#'d' as td,
ct;
e6 = FOREACH e5 GENERATE
id,
v,
scmid,
et,
(long)ts,
tg#'$oid' as tg,
tv#'$oid' as tv,
(chararray)td as td,
ct;
e7 = FOREACH e6 GENERATE
id,
v,
scmid,
et,
ts,
'c' as tt,
tg,
tv,
td,
ct;
e8 = FOREACH e7 GENERATE
id,v,scmid,et,ts,tt,
CONCAT(CONCAT(CONCAT(CONCAT(tg,'_'),tv),'_'),td) as ct,
tg,tv,td,ct;
我终于能够通过将 e3
别名的分配更改为
e3 = FOREACH eng_joined GENERATE
//...kept everything else the same...
TOMAP('count_val', (long)eng_grp_out::ct);
从那里我可以通过 (long)#'count_val' as val
.
e4
作业中需要的值