C# 如何知道列表的哪些元素是字符串的子字符串?

C# How to know which elements of a list are substrings of a string?

如果我有一个像

这样的字符串列表
var MyList = new List<string>
{
    "substring1", "substring2", "substring3", "substring4", "substring5"
};

是否有任何有效的方法来确定该列表的哪些元素包含在以下字符串中

“substring1 是经过电子处理的 substring2 文档”

在这种情况下,结果应该是

var MySubList = new List<string>
{
    "substring1", "substring2"
};

我们可以使用 LINQ Where 来查询,对于每个子字符串,大字符串 Contains 是否是子字符串:

var MyList = new List<string>
{
    "substring1", "substring2", "substring3", "substring4", "substring5"
};

var Text = "substring1 is the substring2 document that was processed electronically";

var output = MyList.Where(x => Text.Contains(x)).ToList();
  1. 用空格分割 Text
  2. 按字母顺序排列单词
  3. 从中创建一个唯一列表
var words = Text.Split(" ").OrderBy(word => word).Distinct().ToList();
  1. 为比赛创建一个累加器集合
  2. 创建两个索引变量(一个用于words,一个用于patterns
List<string> matches = new();
int patternIdx = 0, wordIdx = 0;
  1. 遍历列表,直到到达其中一个集合的末尾
while(patternIdx < patterns.Count && wordIdx < words.Count)
{

}
  1. 执行字符串比较
  2. 根据比较结果推进索引变量
int comparison = string.Compare(patterns[patternIdx],words[wordIdx]);
switch(comparison)
{
    case > 0: wordIdx++; break;
    case < 0: patternIdx++; break;
    default: 
    {
        matches.Add(patterns[patternIdx]); 
        wordIdx++;
        patternIdx++;
        break;
    }
}

这里我使用了 C# 9 的新特性 switch + pattern matching
如果您不能使用 C# 9,那么 if ... else if .. else 块也可以。


为了完整起见,这里是完整的代码

var Text = "substring1 is the substring2 document that was processed electronically";
var words = Text.Split(" ").OrderBy(x => x).Distinct().ToList();
var patterns = new List<string> {  "substring1", "substring2", "substring3", "substring4", "substring5" };

List<string> matches = new();
int patternIdx = 0, wordIdx = 0;
while(patternIdx < patterns.Count && wordIdx < words.Count)
{
    int comparison = string.Compare(patterns[patternIdx], words[wordIdx]);
    switch(comparison)
    {
        case > 0: wordIdx++; break;
        case < 0: patternIdx++; break;
        default: 
        {
            matches.Add(patterns[patternIdx]); 
            wordIdx++;
            patternIdx++;
            break;
        }
    }
}

Dotnetfiddle link