停用词删除出错

Stop word removing went wrong

出于某些 IR 目的,我想提取一些文本片段,在分析之前,我希望删除停用词。为此,我制作了一个 txt 停用词文件,然后使用以下代码尝试删除那些无用的词:

private static void stopWordRemowal() throws FileNotFoundException, IOException {

Set<String> stopWords = new LinkedHashSet<String>();
BufferedReader br = new BufferedReader(new FileReader("StopWord.txt"));
for(String line;(line = br.readLine()) != null;)
   stopWords.add(line.trim());


BufferedReader  br2 = new BufferedReader(new FileReader("text"));
FileOutputStream theNewWords=new FileOutputStream(temp);

for(String readReady;(readReady = br2.readLine()) != null;)

    {
    StringTokenizer tokenizer =new StringTokenizer(readReady) ;
        String temp=tokenizer.nextToken();
        if(!stopWords.equals(temp))
        {   
            theNewWords.write(temp.getBytes());
            theNewWords.write(System.getProperty("line.separator").getBytes());
        }}

    }

但实际上效果并不好。考虑以下示例文本片段:

Text summarization is the process of extracting salient information from the source text and to present that 
information to the user in the form of summary

输出如下:

Text
summarization
is
the
process
of
extracting
salient
information
from
the
source
text
and
to
present
that
information
to
the
user
in
the
form
of
summary

几乎没有效果。但是不知道为什么。

您应该使用 Set 的 contains 方法,而不是像这样的等于方法:

 if(!stopWords.contains(temp))//does set contains my string temp?

而不是

if(!stopWords.equals(temp))//set equals to string? not possible