检查字符串是否包含子字符串列表并保存匹配的子字符串
Check if a string contains a list of substrings and save the matching ones
这是我的情况:我有一个代表文本的字符串
string myText = "Text to analyze for words, bar, foo";
以及要在其中搜索的单词列表
List<string> words = new List<string> {"foo", "bar", "xyz"};
我想知道最有效的方法(如果存在)来获取文本中包含的单词列表,例如:
List<string> matches = myText.findWords(words)
除了必须使用Contains
方法外,此查询没有特殊分析。所以你可以试试这个:
string myText = "Text to analyze for words, bar, foo";
List<string> words = new List<string> { "foo", "bar", "xyz" };
var result = words.Where(i => myText.Contains(i)).ToList();
//result: bar, foo
您可以使用 HashSet<string>
并交叉两个集合:
string myText = "Text to analyze for words, bar, foo";
string[] splitWords = myText.Split(' ', ',');
HashSet<string> hashWords = new HashSet<string>(splitWords,
StringComparer.OrdinalIgnoreCase);
HashSet<string> words = new HashSet<string>(new[] { "foo", "bar" },
StringComparer.OrdinalIgnoreCase);
hashWords.IntersectWith(words);
发挥您希望能够使用 myText.findWords(words)
的想法,您可以为字符串 class 创建一个扩展方法来执行您想要的操作。
public static class StringExtentions
{
public static List<string> findWords(this string str, List<string> words)
{
return words.Where(str.Contains).ToList();
}
}
用法:
string myText = "Text to analyze for words, bar, foo";
List<string> words = new List<string> { "foo", "bar", "xyz" };
List<string> matches = myText.findWords(words);
Console.WriteLine(String.Join(", ", matches.ToArray()));
Console.ReadLine();
结果:
foo, bar
正则表达式解决方案
var words = new string[]{"Lucy", "play", "soccer"};
var text = "Lucy loves going to the field and play soccer with her friend";
var match = new Regex(String.Join("|",words)).Match(text);
var result = new List<string>();
while (match.Success) {
result.Add(match.Value);
match = match.NextMatch();
}
//Result ["Lucy", "play", "soccer"]
这是一个考虑空格和标点符号的简单解决方案:
static void Main(string[] args)
{
string sentence = "Text to analyze for words, bar, foo";
var words = Regex.Split(sentence, @"\W+");
var searchWords = new List<string> { "foo", "bar", "xyz" };
var foundWords = words.Intersect(searchWords);
foreach (var item in foundWords)
{
Console.WriteLine(item);
}
Console.ReadLine();
}
这是我的情况:我有一个代表文本的字符串
string myText = "Text to analyze for words, bar, foo";
以及要在其中搜索的单词列表
List<string> words = new List<string> {"foo", "bar", "xyz"};
我想知道最有效的方法(如果存在)来获取文本中包含的单词列表,例如:
List<string> matches = myText.findWords(words)
除了必须使用Contains
方法外,此查询没有特殊分析。所以你可以试试这个:
string myText = "Text to analyze for words, bar, foo";
List<string> words = new List<string> { "foo", "bar", "xyz" };
var result = words.Where(i => myText.Contains(i)).ToList();
//result: bar, foo
您可以使用 HashSet<string>
并交叉两个集合:
string myText = "Text to analyze for words, bar, foo";
string[] splitWords = myText.Split(' ', ',');
HashSet<string> hashWords = new HashSet<string>(splitWords,
StringComparer.OrdinalIgnoreCase);
HashSet<string> words = new HashSet<string>(new[] { "foo", "bar" },
StringComparer.OrdinalIgnoreCase);
hashWords.IntersectWith(words);
发挥您希望能够使用 myText.findWords(words)
的想法,您可以为字符串 class 创建一个扩展方法来执行您想要的操作。
public static class StringExtentions
{
public static List<string> findWords(this string str, List<string> words)
{
return words.Where(str.Contains).ToList();
}
}
用法:
string myText = "Text to analyze for words, bar, foo";
List<string> words = new List<string> { "foo", "bar", "xyz" };
List<string> matches = myText.findWords(words);
Console.WriteLine(String.Join(", ", matches.ToArray()));
Console.ReadLine();
结果:
foo, bar
正则表达式解决方案
var words = new string[]{"Lucy", "play", "soccer"};
var text = "Lucy loves going to the field and play soccer with her friend";
var match = new Regex(String.Join("|",words)).Match(text);
var result = new List<string>();
while (match.Success) {
result.Add(match.Value);
match = match.NextMatch();
}
//Result ["Lucy", "play", "soccer"]
这是一个考虑空格和标点符号的简单解决方案:
static void Main(string[] args)
{
string sentence = "Text to analyze for words, bar, foo";
var words = Regex.Split(sentence, @"\W+");
var searchWords = new List<string> { "foo", "bar", "xyz" };
var foundWords = words.Intersect(searchWords);
foreach (var item in foundWords)
{
Console.WriteLine(item);
}
Console.ReadLine();
}