为什么 Dump operator return 一个路径?
Why Dump operator return a path?
我有一个简单的猪代码:
CRE_28001 = LOAD '$input' USING PigStorage(';') AS (CIA_CD_CRV_CIA:chararray,CIA_DA_EM_CRV:chararray,CIA_CD_CTRL_BLCE:chararray);
-- Generer les colonnes du fichier
Data = FOREACH CRE_28001 GENERATE
(chararray) CIA_CD_CRV_CIA AS CIA_CD_CRV_CIA,
(chararray) CIA_DA_EM_CRV AS CIA_DA_EM_CRV,
(chararray) CIA_CD_CTRL_BLCE AS CIA_CD_CTRL_BLCE,
(chararray) RUB_202 AS RUB_202;
-- Etablir le filtre exigee
CRE_28001_FILTER = FILTER Data BY (RUB_202 == '6');
LIMIT_DATA = LIMIT CRE_28001_FILTER 10;
DUMP LIMIT_DATA;
我确信我的过滤器是正确的。 RUB_202 列有超过 100 行的值为“6”。我验证了很多次
看看我得到了什么:
Input(s):
Successfully read 444 records (583792 bytes) from: "/hdfs/data/adhoc/PR/02/RDO0/BB0/MGM28001-2019-08-19.csv"
Output(s):
Successfully stored 0 records in: "hdfs://ha-manny/hdfs/hadoop/pig/tmp/temp1618713487/tmp-1281522727"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1549794175705_3500029 -> job_1549794175705_3500031,
job_1549794175705_3500031
注意我没有要求保存hdfs://ha-manny/hdfs/hadoop/pig/tmp/temp1618713487/tmp-1281522727.
中的数据
为什么这是自动生成的,我可以看到任何数据描述或演示。
当我只是想看看过滤器的结果时,我也明白了
解决方案是使用索引号而不是名称来引用列。
换句话说:
Data = FOREACH CRE_28001 GENERATE
(chararray) [=10=] AS CIA_CD_CRV_CIA,
(chararray) AS CIA_DA_EM_CRV,
(chararray) AS CIA_CD_CTRL_BLCE,
(chararray) AS RUB_202;
然后我使用了 TRIM 运算符,因为有些列在数据中有空格!
它有效
我有一个简单的猪代码:
CRE_28001 = LOAD '$input' USING PigStorage(';') AS (CIA_CD_CRV_CIA:chararray,CIA_DA_EM_CRV:chararray,CIA_CD_CTRL_BLCE:chararray);
-- Generer les colonnes du fichier
Data = FOREACH CRE_28001 GENERATE
(chararray) CIA_CD_CRV_CIA AS CIA_CD_CRV_CIA,
(chararray) CIA_DA_EM_CRV AS CIA_DA_EM_CRV,
(chararray) CIA_CD_CTRL_BLCE AS CIA_CD_CTRL_BLCE,
(chararray) RUB_202 AS RUB_202;
-- Etablir le filtre exigee
CRE_28001_FILTER = FILTER Data BY (RUB_202 == '6');
LIMIT_DATA = LIMIT CRE_28001_FILTER 10;
DUMP LIMIT_DATA;
我确信我的过滤器是正确的。 RUB_202 列有超过 100 行的值为“6”。我验证了很多次
看看我得到了什么:
Input(s):
Successfully read 444 records (583792 bytes) from: "/hdfs/data/adhoc/PR/02/RDO0/BB0/MGM28001-2019-08-19.csv"
Output(s):
Successfully stored 0 records in: "hdfs://ha-manny/hdfs/hadoop/pig/tmp/temp1618713487/tmp-1281522727"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1549794175705_3500029 -> job_1549794175705_3500031,
job_1549794175705_3500031
注意我没有要求保存hdfs://ha-manny/hdfs/hadoop/pig/tmp/temp1618713487/tmp-1281522727.
为什么这是自动生成的,我可以看到任何数据描述或演示。
当我只是想看看过滤器的结果时,我也明白了
解决方案是使用索引号而不是名称来引用列。 换句话说:
Data = FOREACH CRE_28001 GENERATE
(chararray) [=10=] AS CIA_CD_CRV_CIA,
(chararray) AS CIA_DA_EM_CRV,
(chararray) AS CIA_CD_CTRL_BLCE,
(chararray) AS RUB_202;
然后我使用了 TRIM 运算符,因为有些列在数据中有空格! 它有效