通过 PIG 脚本中不同块内计算的条件值在 FOREACH 块内过滤
Filtering inside a FOREACH block by a condition value calculated inside the different blocks in PIG script
我有 2 个数据集,我需要找到匹配记录的匹配记录
从数据集 1 到数据集 2,因此:
dataset 1 = [sourceID, details, key]
1, details1, 1111
2, details2, 1112
3, details3, 1113
4, details4, 1114
...
dataset2 = [key1, key2, number]
1111,1112,3
1111,1114,1
1112,1113,11
...
output:
1, details1, 1111, 2, details2, 1112, 3
1, details1,1111, 4, details4, 1114, 1
2, details2, 1112, 3, details3, 11
....
我试过如下:
a = foreach dataset1 {
b = filter dataset2 by dataset1.key1 matches dataset1.key;
c = filter dataset2 by datset1.key2 matches dataset1.key;
generate b, c;
};
请提供任何帮助。
非常感谢。
运行 两个连接?
B = join dataset1 by key, dataset2 by key1;
C = join dataset1 by key, B by key2;
我有 2 个数据集,我需要找到匹配记录的匹配记录 从数据集 1 到数据集 2,因此:
dataset 1 = [sourceID, details, key]
1, details1, 1111
2, details2, 1112
3, details3, 1113
4, details4, 1114
...
dataset2 = [key1, key2, number]
1111,1112,3
1111,1114,1
1112,1113,11
...
output:
1, details1, 1111, 2, details2, 1112, 3
1, details1,1111, 4, details4, 1114, 1
2, details2, 1112, 3, details3, 11
....
我试过如下:
a = foreach dataset1 {
b = filter dataset2 by dataset1.key1 matches dataset1.key;
c = filter dataset2 by datset1.key2 matches dataset1.key;
generate b, c;
};
请提供任何帮助。
非常感谢。
运行 两个连接?
B = join dataset1 by key, dataset2 by key1;
C = join dataset1 by key, B by key2;