我如何过滤 Apache Pig 中的行?

How may I filter lines in Apache Pig?

我有一个 txt,然后我使用这个脚本从 txt 加载行:

lines = LOAD '/user/hadoop/HDFS_File.txt' AS (line:chararray);

我需要用一些词来过滤每一行。我的意思是:

如果该行是:

'Hi, I'm lord Stark, how are you?'

我需要搜索:"how are you" 行,对于 txt 中的每一行并计算出现次数。

我试过:

sentences = FOREACH lines GENERATE (FILTER lines BY (f1 matches 'how are you')) AS sent;

但是没用。 请帮助我。

使用以下内容过滤具有 "how are you" 字符串的记录:

lines = LOAD '/user/hadoop/HDFS_File.txt' AS (line:chararray);
sentence  = FILTER lines BY (line matches '.*how are you.*');

获取出现次数:

grouped= GROUP sentence ALL;
sentence_COUNT = FOREACH grouped GENERATE COUNT(sentence);