我如何过滤 Apache Pig 中的行?
How may I filter lines in Apache Pig?
我有一个 txt,然后我使用这个脚本从 txt 加载行:
lines = LOAD '/user/hadoop/HDFS_File.txt' AS (line:chararray);
我需要用一些词来过滤每一行。我的意思是:
如果该行是:
'Hi, I'm lord Stark, how are you?'
我需要搜索:"how are you"
行,对于 txt 中的每一行并计算出现次数。
我试过:
sentences = FOREACH lines GENERATE (FILTER lines BY (f1 matches 'how are you')) AS sent;
但是没用。
请帮助我。
使用以下内容过滤具有 "how are you" 字符串的记录:
lines = LOAD '/user/hadoop/HDFS_File.txt' AS (line:chararray);
sentence = FILTER lines BY (line matches '.*how are you.*');
获取出现次数:
grouped= GROUP sentence ALL;
sentence_COUNT = FOREACH grouped GENERATE COUNT(sentence);
我有一个 txt,然后我使用这个脚本从 txt 加载行:
lines = LOAD '/user/hadoop/HDFS_File.txt' AS (line:chararray);
我需要用一些词来过滤每一行。我的意思是:
如果该行是:
'Hi, I'm lord Stark, how are you?'
我需要搜索:"how are you"
行,对于 txt 中的每一行并计算出现次数。
我试过:
sentences = FOREACH lines GENERATE (FILTER lines BY (f1 matches 'how are you')) AS sent;
但是没用。 请帮助我。
使用以下内容过滤具有 "how are you" 字符串的记录:
lines = LOAD '/user/hadoop/HDFS_File.txt' AS (line:chararray);
sentence = FILTER lines BY (line matches '.*how are you.*');
获取出现次数:
grouped= GROUP sentence ALL;
sentence_COUNT = FOREACH grouped GENERATE COUNT(sentence);