PIG 中的条件语句
Conditional statements in PIG
我在一个文本文件中有以下输入,需要根据逻辑在另一个文件中生成输出。
这是我的输入文件:
customerid|Dateofsubscription|Customercode|CustomerType|CustomerText
1001|2017-05-23|455|CODE|SPRINT56
1001|2017-05-23|455|DESC|Unlimited Plan
1001|2017-05-23|455|DATE|2017-05-05
1002|2017-05-24|455|CODE|SPRINT56
1002|2017-05-24|455|DESC|Unlimited Plan
1002|2017-05-24|455|DATE|2017-05-06
逻辑:
If Customercode = 455
if( CustomerType = "CODE" )
Val= CustomerText
if( CustomerType = "DESC" )
Description = CustomerText
if( CustomerType = "DATE" )
Date = CustomerText
输出:
customerid|Val|Description|Date
1001|SPRINT56|Unlimited Plan|2017-05-05
1002|SPRINT56|Unlimited Plan|2017-05-06
你能帮我解决这个问题吗?
过滤customercode=455的输入,生成需要的2列,然后按customerid分组,然后使用BagToString
.
B = FILTER A BY Customercode == 455;
C = FOREACH B GENERATE [=10=] as CustomerId, as CustomerText;
D = GROUP C BY CustomerId;
E = FOREACH D GENERATE group AS CustomerId, BagToString(C.CustomerText, '|'); -- Note:This will generate 1001,SPRINT56|Unlimited Plan|2017-05-05 so,you will have to concat the first field with '|' and then concat the resulting field with the second field which is already delimited by '|'.
F = FOREACH E GENERATE CONCAT(CONCAT([=10=],'|'),);
DUMP F;
rawData = LOAD data;
filteredData = FILTER rawData BY (Customercode == 455);
--Extract and set Val/Description/Date based on CustomerText and 'null' otherwise
ExtractedData = FOREACH filteredData GENERATE
customerId,
(CustomerType == "CODE" ? CustomerText : null) AS Val,
(CustomerType == "DESC" ? CustomerText : null) AS Description,
(CustomerType == "DATE" ? CustomerText : null) AS Date;
groupedData = GROUP ExtractedData BY customerId;
--While taking MAX, all 'nulls' will be ignored
finalData = FOREACH groupedData GENERATE
group as CustomerId,
MAX(.Val) AS Val,
MAX(.Description) AS Description,
MAX(.Date) AS Date;
DUMP finalData;
我已经指定了核心逻辑。加载、格式化和存储应该是直接的。
我在一个文本文件中有以下输入,需要根据逻辑在另一个文件中生成输出。 这是我的输入文件:
customerid|Dateofsubscription|Customercode|CustomerType|CustomerText
1001|2017-05-23|455|CODE|SPRINT56
1001|2017-05-23|455|DESC|Unlimited Plan
1001|2017-05-23|455|DATE|2017-05-05
1002|2017-05-24|455|CODE|SPRINT56
1002|2017-05-24|455|DESC|Unlimited Plan
1002|2017-05-24|455|DATE|2017-05-06
逻辑:
If Customercode = 455
if( CustomerType = "CODE" )
Val= CustomerText
if( CustomerType = "DESC" )
Description = CustomerText
if( CustomerType = "DATE" )
Date = CustomerText
输出:
customerid|Val|Description|Date
1001|SPRINT56|Unlimited Plan|2017-05-05
1002|SPRINT56|Unlimited Plan|2017-05-06
你能帮我解决这个问题吗?
过滤customercode=455的输入,生成需要的2列,然后按customerid分组,然后使用BagToString .
B = FILTER A BY Customercode == 455;
C = FOREACH B GENERATE [=10=] as CustomerId, as CustomerText;
D = GROUP C BY CustomerId;
E = FOREACH D GENERATE group AS CustomerId, BagToString(C.CustomerText, '|'); -- Note:This will generate 1001,SPRINT56|Unlimited Plan|2017-05-05 so,you will have to concat the first field with '|' and then concat the resulting field with the second field which is already delimited by '|'.
F = FOREACH E GENERATE CONCAT(CONCAT([=10=],'|'),);
DUMP F;
rawData = LOAD data;
filteredData = FILTER rawData BY (Customercode == 455);
--Extract and set Val/Description/Date based on CustomerText and 'null' otherwise
ExtractedData = FOREACH filteredData GENERATE
customerId,
(CustomerType == "CODE" ? CustomerText : null) AS Val,
(CustomerType == "DESC" ? CustomerText : null) AS Description,
(CustomerType == "DATE" ? CustomerText : null) AS Date;
groupedData = GROUP ExtractedData BY customerId;
--While taking MAX, all 'nulls' will be ignored
finalData = FOREACH groupedData GENERATE
group as CustomerId,
MAX(.Val) AS Val,
MAX(.Description) AS Description,
MAX(.Date) AS Date;
DUMP finalData;
我已经指定了核心逻辑。加载、格式化和存储应该是直接的。