检查字符串是否包含子字符串列表并保存匹配的子字符串

Check if a string contains a list of substrings and save the matching ones

这是我的情况:我有一个代表文本的字符串

string myText = "Text to analyze for words, bar, foo";   

以及要在其中搜索的单词列表

List<string> words = new List<string> {"foo", "bar", "xyz"};

我想知道最有效的方法(如果存在)来获取文本中包含的单词列表,例如:

List<string> matches = myText.findWords(words)

除了必须使用Contains方法外,此查询没有特殊分析。所以你可以试试这个:

string myText = "Text to analyze for words, bar, foo";

List<string> words = new List<string> { "foo", "bar", "xyz" };

var result = words.Where(i => myText.Contains(i)).ToList();
//result: bar, foo

您可以使用 HashSet<string> 并交叉两个集合:

string myText = "Text to analyze for words, bar, foo"; 
string[] splitWords = myText.Split(' ', ',');

HashSet<string> hashWords = new HashSet<string>(splitWords,
                                                StringComparer.OrdinalIgnoreCase);
HashSet<string> words = new HashSet<string>(new[] { "foo", "bar" },
                                            StringComparer.OrdinalIgnoreCase);

hashWords.IntersectWith(words);

发挥您希望能够使用 myText.findWords(words) 的想法,您可以为字符串 class 创建一个扩展方法来执行您想要的操作。

public static class StringExtentions
{
    public static List<string> findWords(this string str, List<string> words)
    {
        return words.Where(str.Contains).ToList();
    }
}

用法:

string myText = "Text to analyze for words, bar, foo";
List<string> words = new List<string> { "foo", "bar", "xyz" };
List<string> matches = myText.findWords(words);
Console.WriteLine(String.Join(", ", matches.ToArray()));
Console.ReadLine();

结果:

foo, bar

正则表达式解决方案

var words = new string[]{"Lucy", "play", "soccer"};
var text = "Lucy loves going to the field and play soccer with her friend";
var match = new Regex(String.Join("|",words)).Match(text);
var result = new List<string>();

while (match.Success) {
    result.Add(match.Value);
    match = match.NextMatch();
}

//Result ["Lucy", "play", "soccer"]

这是一个考虑空格和标点符号的简单解决方案:

static void Main(string[] args)
{
    string sentence = "Text to analyze for words, bar, foo";            
    var words = Regex.Split(sentence, @"\W+");
    var searchWords = new List<string> { "foo", "bar", "xyz" };
    var foundWords = words.Intersect(searchWords);

    foreach (var item in foundWords)
    {
        Console.WriteLine(item);
    }

    Console.ReadLine();
}