从 asp.net c# 中的字符串中删除停用词
remove stopword from a String in asp.net c#
我在创建从字符串中删除停用词的代码时遇到问题。这是我的代码:
String Review="The portfolio is fine except for the fact that the last movement of sonata #6 is missing. What should one expect?";
string[] arrStopword = new string[] {"a", "i", "it", "am", "at", "on", "in", "to", "too", "very","of", "from", "here", "even", "the", "but", "and", "is","my","them", "then", "this", "that", "than", "though", "so", "are"};
StringBuilder sbReview = new StringBuilder(Review);
foreach (string word in arrStopword){
sbReview.Replace(word, "");}
Label1.Text = sbReview.ToString();
当运行Label1.Text = "The portfolo s fne except for fct tht lst movement st #6 s mssng. Wht should e expect? "
我希望它必须 return "portofolio fine except for fact last movement sonata #6 is missing. what should one expect?"
有人知道如何解决这个问题吗?
问题是您比较的是子字符串,而不是单词。您需要将原文拆分,删除项目,然后重新加入。
试试这个
List<string> words = Review.Split(" ").ToList();
foreach(string stopWord in arrStopWord)
words.Remove(stopWord);
string result = String.Join(" ", words);
我能看到的唯一问题是它不能很好地处理标点符号,但你明白了总体思路。
您可以使用 LINQ 来解决这个问题。您首先需要使用 Split
函数将 string
转换为由 " "
(space) 分隔的 string
的 list
,然后使用 Except
得到你的结果将包含的单词然后可以应用 string.Join
var newString = string.Join(" ", Review.Split(' ').Except(arrStopword));
您可以使用“a”、“I”等来确保程序只删除那些用作单词的单词(因此它们周围有 spaces)。只需将它们替换为 space 即可保持格式不变。
或者您可以使用 dotnet-stop-words package。
只需调用 RemoveStopWords
方法
(yourString).RemoveStopWords("en");
我在创建从字符串中删除停用词的代码时遇到问题。这是我的代码:
String Review="The portfolio is fine except for the fact that the last movement of sonata #6 is missing. What should one expect?";
string[] arrStopword = new string[] {"a", "i", "it", "am", "at", "on", "in", "to", "too", "very","of", "from", "here", "even", "the", "but", "and", "is","my","them", "then", "this", "that", "than", "though", "so", "are"};
StringBuilder sbReview = new StringBuilder(Review);
foreach (string word in arrStopword){
sbReview.Replace(word, "");}
Label1.Text = sbReview.ToString();
当运行Label1.Text = "The portfolo s fne except for fct tht lst movement st #6 s mssng. Wht should e expect? "
我希望它必须 return "portofolio fine except for fact last movement sonata #6 is missing. what should one expect?"
有人知道如何解决这个问题吗?
问题是您比较的是子字符串,而不是单词。您需要将原文拆分,删除项目,然后重新加入。
试试这个
List<string> words = Review.Split(" ").ToList();
foreach(string stopWord in arrStopWord)
words.Remove(stopWord);
string result = String.Join(" ", words);
我能看到的唯一问题是它不能很好地处理标点符号,但你明白了总体思路。
您可以使用 LINQ 来解决这个问题。您首先需要使用 Split
函数将 string
转换为由 " "
(space) 分隔的 string
的 list
,然后使用 Except
得到你的结果将包含的单词然后可以应用 string.Join
var newString = string.Join(" ", Review.Split(' ').Except(arrStopword));
您可以使用“a”、“I”等来确保程序只删除那些用作单词的单词(因此它们周围有 spaces)。只需将它们替换为 space 即可保持格式不变。
或者您可以使用 dotnet-stop-words package。
只需调用 RemoveStopWords
方法
(yourString).RemoveStopWords("en");