使用猪将数据存储到具有特定格式的文件中
Storing data into a file with specific format using pig
我最近在做一个项目,我需要以某种特定格式输出最终数据。虽然我的实际数据集非常复杂。我将使用虚拟数据来解释我的问题。
如果我有以下数据-
1
2
3
4
5
5
4
2
1
然后我想用pig输出这个数据,格式如下-
Between 4 and 8 2
Between 1 and 5 5
注意 -> 对于 4 到 8,我不包括 4,8 本身。
我试过以下代码,但如何将 Between 4 and 8
添加到 pig 的最终输出中。
data = LOAD 'f.txt' AS num:int;
data1 = GROUP data BY num;
data2 = FOREACH data1 GENERATE group AS num, COUNT(data) AS count;
data3 = FILTER data2 BY count > 4 AND count < 8;
data4 = FILTER data3 BY count > 1 AND count < 5;
从这里开始,我不知道如何将 data3、data4 以我上面指定的格式存储在一个文件中。
创建两个过滤后的数据集,对它们进行计数并将结果合并为单个输出。在写作之前,在个人计数前添加您想要的文字。
data = LOAD 'f.txt' AS num:int;
data3 = FILTER data BY num > 4 AND num < 8;
data4 = FILTER data BY num > 1 AND num < 5;
data3_grp = GROUP data3 ALL;
data3_count = FOREACH data3_grp GENERATE 'Between 4 and 8',COUNT(data3);
data4_grp = GROUP data4 ALL;
data4_count = FOREACH data4_grp GENERATE 'Between 1 and 5',COUNT(data4);
data5 = UNION data3_count,data4_count
我最近在做一个项目,我需要以某种特定格式输出最终数据。虽然我的实际数据集非常复杂。我将使用虚拟数据来解释我的问题。
如果我有以下数据-
1
2
3
4
5
5
4
2
1
然后我想用pig输出这个数据,格式如下-
Between 4 and 8 2
Between 1 and 5 5
注意 -> 对于 4 到 8,我不包括 4,8 本身。
我试过以下代码,但如何将 Between 4 and 8
添加到 pig 的最终输出中。
data = LOAD 'f.txt' AS num:int;
data1 = GROUP data BY num;
data2 = FOREACH data1 GENERATE group AS num, COUNT(data) AS count;
data3 = FILTER data2 BY count > 4 AND count < 8;
data4 = FILTER data3 BY count > 1 AND count < 5;
从这里开始,我不知道如何将 data3、data4 以我上面指定的格式存储在一个文件中。
创建两个过滤后的数据集,对它们进行计数并将结果合并为单个输出。在写作之前,在个人计数前添加您想要的文字。
data = LOAD 'f.txt' AS num:int;
data3 = FILTER data BY num > 4 AND num < 8;
data4 = FILTER data BY num > 1 AND num < 5;
data3_grp = GROUP data3 ALL;
data3_count = FOREACH data3_grp GENERATE 'Between 4 and 8',COUNT(data3);
data4_grp = GROUP data4 ALL;
data4_count = FOREACH data4_grp GENERATE 'Between 1 and 5',COUNT(data4);
data5 = UNION data3_count,data4_count