在 Apache Pig 中实现 UPPER、TRIM 和 REPLACE

Implementing UPPER,TRIM and REPLACE in Apache Pig

我对猪环境很陌生。我尝试用两种方式实现我的 pig 脚本文件。

我.

data = LOAD 'sample2.txt' USING PigStorage(',') as(campaign_id:chararray,date:chararray,time:chararray,display_site:chararray,placement:chararray,was_clicked:int,cpc:int,keyword:chararray);

distinct_data = DISTINCT data;

val = foreach distinct_data generate campaign_id,date,time,UPPER(keyword),display_site,placement,was_clicked,cpc;

val1 = foreach val generate campaign_id,date,time,TRIM(keyword),display_site,placement,was_clicked,cpc;

val2 = foreach val1 generate campaign_id,REPLACE(date, '-', '/'),time,keyword,display_site,placement,was_clicked,cpc;

dump val2;

我得到错误:

2016-09-29 02:45:40,826 INFO org.apache.pig.Main: Apache Pig version 0.10.0-cdh4.2.1 (rexported) compiled Apr 22 2013, 12:04:54 2016-09-29 02:45:40,827 INFO org.apache.pig.Main: Logging error messages to: /home/training/training_materials/analyst/exercises/pig_etl/pig_1475131540824.log 2016-09-29 02:45:42,371 ERROR org.apache.pig.tools.grunt.Grunt: ERROR 1025: Invalid field projection. Projected field [keyword] does not exist in schema: campaign_id:chararray,date:chararray,time:chararray,org.apache.pig.builtin.upper_keyword_12:chararray,display_site:chararray,placement:chararray,was_clicked:int,cpc:int. Details at logfile: /home/hduser/pig_etl/pig_1475131540824.log

但是当我在一个语句中集成 UPPER、TRIM 和 REPLACE 时,它就起作用了:

二.

data = LOAD 'sample2.txt' USING PigStorage(',') as(campaign_id:chararray,date:chararray,time:chararray,display_site:chararray,placement:chararray,was_clicked:int,cpc:int,keyword:chararray);

distinct_data = DISTINCT data;

val = foreach distinct_data generate campaign_id,REPLACE(date, '-', '/'),time,TRIM(UPPER(keyword)),display_site,placement,was_clicked,cpc;
dump val;

所以,我只想有人向我解释为什么 I.method 不起作用以及错误消息是什么。

当您在 val1 中应用 TRIM 时,val 中没有名为“keyword”的内容。

注意当你应用任何函数时使用别名这样你就可以避免错误..

或者在创建新关系之前,使用 describe 总是好的,这样模式对您来说是清楚的..

解决方案是:

data = LOAD 'sample2.txt' USING PigStorage(',') as(campaign_id:chararray,date:chararray,time:chararray,display_site:chararray,placement:chararray,was_clicked:int,cpc:int,keyword:chararray);

distinct_data = DISTINCT data;

val = foreach distinct_data generate campaign_id,date,time,UPPER(keyword) as keyword,display_site,placement,was_clicked,cpc;

val1 = foreach val generate campaign_id,date,time,TRIM(keyword) as keyword,display_site,placement,was_clicked,cpc;

val2 = foreach val1 generate campaign_id,REPLACE(date, '-', '/') as date,time,keyword,display_site,placement,was_clicked,cpc;

dump val2;