从字符串中删除停用词

Stopwords removing from string

我正在尝试从字符串中删除停用词,但问题是如果它再次出现在字符串中,它会从单个单词中删除字符。
例如原始字符串是: “这部电影不错。” 结果字符串是: “这部电影不错。”。作品 fine.but
如果字符串是:“这部电影不错
然后结果字符串将是:“th movie good.
由于在这个字符串中重复所以它在结果中被排除。
另一个字符串: "这个游戏很棒。所以,我看了很多,玩了很多。"
结果:“gme fntstic。所以,witched plyed lot.
As a 在此字符串中重复因此导致字符串显示所有单词豁免 a.

我正在唱这个代码:

List<string> stopWordsList = new List<string>();
stopWordsList = stopWordsFilter();//funtion returning the list of stop words taking from file.
        string propertyValue = "this game is fantastic. So, I watched and played a lot.";
        foreach (string word1 in propertyValue.Split(' '))
        {

            foreach ( var word in stopWordsList)
            {
                if (word.Equals(word1) && word.Length == word1.Length)
                {
                    propertyValue = propertyValue.Replace(word, "");
                }
            }
        }
        Console.WriteLine(propertyValue);

问题是您将停用词替换为 String.EmptyString.Replace 不关心 单词 但子字符串。

您可以使用这种方法:

string propertyValue = "this game is fantastic. So, I watched and played a lot.";
var words = propertyValue.Split();
var newWords = words.Except(stopWordsFilter);
propertyValue = string.Join(" ", newWords);

如果你想忽略大小写,那么也省略 "Is":

var newWords = words.Except(stopWordsFilter, StringComparer.InvariantCultureIgnoreCase);

我在这里提出一个使用 linq 的解决方案:

    string result = propertyValue.Split(' ')
        .Where(s => !stopWordsList.Contains(s))
        .Aggregate((current, next) => current + " " + next);
    Console.WriteLine(result);