如何找出 GROUP 中有多少个元组
How to findout how many number of tuples are there in a GROUP
这是我的输入
10001 AMERICAN EXPRESS,TX, Y
10001 BOFA,IL,N
10001 CHASE,NJ,Y
10002 CHASE,IL,Y
10002 BOFA,TX,Y
10002 AMERICAN EXPRESS,NJ,Y
10001 AMERICAN EXPRESS,TX, Y
10001 BOFA,IL,N
10001 CHASE,NJ,Y
10002 CHASE,IL,Y
10002 BOFA,TX,Y
我必须对我的使用密钥进行分组
中间输出
10001, {(AMERICAN EXPRESS,TX,Y),(BOFA,IL,N),(CHASE,NJ,Y)}
10002, {(CHASE,IL,Y),(BOFA,TX,Y)}
10001, {(AMERICAN EXPRESS,TX,Y),(BOFA,IL,N),(CHASE,NJ,Y)}
10002, {(CHASE,IL,Y),(BOFA,TX,Y)}
然后我必须找出每个具有多个元组的组中有多少个键。
1001, count(tuples)>1 - count -3
1002, Count(tuples)>1 - count 2
谁能帮帮我。
COUNT 在第二个字段上获取分组后的计数并过滤所有计数 > 1 的组。
A = LOAD 'data.txt' USING PigStorage(',') AS (f1:int,f2:chararray,f3:chararray,f4:chararray);
B = GROUP A BY f1;
C = FOREACH B GENERATE group,COUNT(f2) AS Total;
D = FILTER C BY (Total > 1);
DUMP D;
这是我的输入
10001 AMERICAN EXPRESS,TX, Y
10001 BOFA,IL,N
10001 CHASE,NJ,Y
10002 CHASE,IL,Y
10002 BOFA,TX,Y
10002 AMERICAN EXPRESS,NJ,Y
10001 AMERICAN EXPRESS,TX, Y
10001 BOFA,IL,N
10001 CHASE,NJ,Y
10002 CHASE,IL,Y
10002 BOFA,TX,Y
我必须对我的使用密钥进行分组 中间输出
10001, {(AMERICAN EXPRESS,TX,Y),(BOFA,IL,N),(CHASE,NJ,Y)}
10002, {(CHASE,IL,Y),(BOFA,TX,Y)}
10001, {(AMERICAN EXPRESS,TX,Y),(BOFA,IL,N),(CHASE,NJ,Y)}
10002, {(CHASE,IL,Y),(BOFA,TX,Y)}
然后我必须找出每个具有多个元组的组中有多少个键。
1001, count(tuples)>1 - count -3
1002, Count(tuples)>1 - count 2
谁能帮帮我。
COUNT 在第二个字段上获取分组后的计数并过滤所有计数 > 1 的组。
A = LOAD 'data.txt' USING PigStorage(',') AS (f1:int,f2:chararray,f3:chararray,f4:chararray);
B = GROUP A BY f1;
C = FOREACH B GENERATE group,COUNT(f2) AS Total;
D = FILTER C BY (Total > 1);
DUMP D;