C# RegEx:如何仅匹配文本行中间单词内的字符串?
C# RegEx: How to match a string inside the middle words only of a text line?
使用 C# RegEx,我需要在文本行“123xxx123 123 123xxx xxx123xxx xxx123 123xxx123”的中间词中匹配字符串“123”。
它应该只匹配内部的“123”,而不是第一个词和最后一个词:
“123xxx123 [123] [123]xxx xxx[123]xxx xxx[123] 123xxx123”。
我尝试否定 lookahead/lookbehind 但无济于事。
基本上,我需要支持一个带有选项的查找实用程序,用于查找等于或位于起始词、中间词、最后词、一行中任意位置的匹配项(可以是多个词)。
string pattern_empty_line = @"(" + @"^$" + @")";
string pattern_whole_line = @"(" + @"^" + text + @"$" + @")";
string pattern_whole_word = @"(" + @"\b" + text + @"\b" + @")";
string pattern_prefix = @"(" + @"\S+?" + text + @")";
string pattern_suffix = @"(" + text + @"\S+?" + @")";
string pattern_prefix_and_suffix = @"(" + @"\S+?" + text + @"\S+?" + @")";
// Any Wordness
string pattern_anywordness_start = @"(" + pattern_whole_line + "|"
+ @"(" + @"^" + pattern_whole_word + @")" + "|"
+ @"(" + @"^" + pattern_prefix + @")" + "|"
+ @"(" + @"^" + pattern_suffix + @")" + "|"
+ @"(" + @"^" + pattern_prefix_and_suffix + @")"
+ @")";
string pattern_anywordness_end = @"(" + pattern_whole_line + "|"
+ @"(" + pattern_whole_word + @"$" + @")" + "|"
+ @"(" + pattern_prefix + @"$" + @")" + "|"
+ @"(" + pattern_suffix + @"$" + @")" + "|"
+ @"(" + pattern_prefix_and_suffix + @"$" + @")"
+ @")";
string pattern_anywordness_not_middle = @"(" + pattern_whole_line + "|" + pattern_anywordness_start + "|" + pattern_anywordness_end + @")";
string pattern_anywordness_middle = @"(" + @"\b" + @".*" + text + @".*" + @"\b" + @")";
string pattern_anywordness_anywhere = @"(" + text + @")";
// Part of word
string pattern_partword_start = @"(" + pattern_prefix + "|" + @"^" + pattern_prefix_and_suffix + @")";
string pattern_partword_middle = @"(" + @"(?<!^)" + pattern_prefix_and_suffix + @"(?!$)" + @")";
string pattern_partword_end = @"(" + pattern_prefix_and_suffix + @"$" + pattern_suffix + "|" + @")";
string pattern_partword_anywhere = @"(" + pattern_partword_start + "|" + pattern_partword_middle + "|" + pattern_partword_end + @")";
// Whole word
string pattern_wholeword_start = @"(" + pattern_whole_line + "|" + @"^" + text + @"\b" + @")";
string pattern_wholeword_middle = @"(" + pattern_whole_line + "|" + @"(?<!^)" + @"\b" + text + @"\b" + @"(?!$)" + @")";
string pattern_wholeword_end = @"(" + pattern_whole_line + "|" + @"\b" + text + @"$" + @")";
string pattern_wholeword_anywhere = @"(" + pattern_wholeword_start + "|" + pattern_wholeword_middle + "|" + pattern_wholeword_end + @")";
我能够匹配除中间单词之外的所有单词,甚至能够匹配 "NOT middle words" 内的单词(参见上面的代码)。在 "NOT starting words" 和 "NOT final words" 中找到匹配项会很好。
此外,所需的匹配本身可能是多个单词,因此请考虑到这一点。
最后,我设法解决了我自己的问题。
我只需要在全词、前缀词、后缀词或前缀词和后缀词的搜索模式前后添加“\s”。
string pattern_anywordness_middle = @"(" + pattern_whole_line + "|"
+ @"(" + @"\s" + pattern_whole_word + @"\s" + @")" + "|"
+ @"(" + @"\s" + pattern_prefix + @"\s" + @")" + "|"
+ @"(" + @"\s" + pattern_suffix + @"\s" + @")" + "|"
+ @"(" + @"\s" + pattern_prefix_and_suffix + @"\s" + @")"
+ @")";
使用 C# RegEx,我需要在文本行“123xxx123 123 123xxx xxx123xxx xxx123 123xxx123”的中间词中匹配字符串“123”。
它应该只匹配内部的“123”,而不是第一个词和最后一个词: “123xxx123 [123] [123]xxx xxx[123]xxx xxx[123] 123xxx123”。
我尝试否定 lookahead/lookbehind 但无济于事。
基本上,我需要支持一个带有选项的查找实用程序,用于查找等于或位于起始词、中间词、最后词、一行中任意位置的匹配项(可以是多个词)。
string pattern_empty_line = @"(" + @"^$" + @")";
string pattern_whole_line = @"(" + @"^" + text + @"$" + @")";
string pattern_whole_word = @"(" + @"\b" + text + @"\b" + @")";
string pattern_prefix = @"(" + @"\S+?" + text + @")";
string pattern_suffix = @"(" + text + @"\S+?" + @")";
string pattern_prefix_and_suffix = @"(" + @"\S+?" + text + @"\S+?" + @")";
// Any Wordness
string pattern_anywordness_start = @"(" + pattern_whole_line + "|"
+ @"(" + @"^" + pattern_whole_word + @")" + "|"
+ @"(" + @"^" + pattern_prefix + @")" + "|"
+ @"(" + @"^" + pattern_suffix + @")" + "|"
+ @"(" + @"^" + pattern_prefix_and_suffix + @")"
+ @")";
string pattern_anywordness_end = @"(" + pattern_whole_line + "|"
+ @"(" + pattern_whole_word + @"$" + @")" + "|"
+ @"(" + pattern_prefix + @"$" + @")" + "|"
+ @"(" + pattern_suffix + @"$" + @")" + "|"
+ @"(" + pattern_prefix_and_suffix + @"$" + @")"
+ @")";
string pattern_anywordness_not_middle = @"(" + pattern_whole_line + "|" + pattern_anywordness_start + "|" + pattern_anywordness_end + @")";
string pattern_anywordness_middle = @"(" + @"\b" + @".*" + text + @".*" + @"\b" + @")";
string pattern_anywordness_anywhere = @"(" + text + @")";
// Part of word
string pattern_partword_start = @"(" + pattern_prefix + "|" + @"^" + pattern_prefix_and_suffix + @")";
string pattern_partword_middle = @"(" + @"(?<!^)" + pattern_prefix_and_suffix + @"(?!$)" + @")";
string pattern_partword_end = @"(" + pattern_prefix_and_suffix + @"$" + pattern_suffix + "|" + @")";
string pattern_partword_anywhere = @"(" + pattern_partword_start + "|" + pattern_partword_middle + "|" + pattern_partword_end + @")";
// Whole word
string pattern_wholeword_start = @"(" + pattern_whole_line + "|" + @"^" + text + @"\b" + @")";
string pattern_wholeword_middle = @"(" + pattern_whole_line + "|" + @"(?<!^)" + @"\b" + text + @"\b" + @"(?!$)" + @")";
string pattern_wholeword_end = @"(" + pattern_whole_line + "|" + @"\b" + text + @"$" + @")";
string pattern_wholeword_anywhere = @"(" + pattern_wholeword_start + "|" + pattern_wholeword_middle + "|" + pattern_wholeword_end + @")";
我能够匹配除中间单词之外的所有单词,甚至能够匹配 "NOT middle words" 内的单词(参见上面的代码)。在 "NOT starting words" 和 "NOT final words" 中找到匹配项会很好。
此外,所需的匹配本身可能是多个单词,因此请考虑到这一点。
最后,我设法解决了我自己的问题。
我只需要在全词、前缀词、后缀词或前缀词和后缀词的搜索模式前后添加“\s”。
string pattern_anywordness_middle = @"(" + pattern_whole_line + "|"
+ @"(" + @"\s" + pattern_whole_word + @"\s" + @")" + "|"
+ @"(" + @"\s" + pattern_prefix + @"\s" + @")" + "|"
+ @"(" + @"\s" + pattern_suffix + @"\s" + @")" + "|"
+ @"(" + @"\s" + pattern_prefix_and_suffix + @"\s" + @")"
+ @")";