使用通配符匹配字符串
Matching strings with wildcard
我想用通配符 (*) 匹配字符串,其中通配符表示 "any"。例如:
*X = string must end with X
X* = string must start with X
*X* = string must contain X
此外,还有一些复合用途,例如:
*X*YZ* = string contains X and contains YZ
X*YZ*P = string starts with X, contains YZ and ends with P.
有没有简单的算法可以做到这一点?我不确定是否使用正则表达式(尽管有可能)。
需要说明的是,用户会将上面的内容输入过滤框(尽可能简单的过滤器),我不希望他们必须自己编写正则表达式。所以我可以很容易地从上面的符号转换出来的东西会很好。
*X*YZ* = string contains X and contains YZ
@".*X.*YZ"
X*YZ*P = string starts with X, contains YZ and ends with P.
@"^X.*YZ.*P$"
您可以使用 VB.NET Like-Operator:
string text = "x is not the same as X and yz not the same as YZ";
bool contains = LikeOperator.LikeString(text,"*X*YZ*", Microsoft.VisualBasic.CompareMethod.Binary);
如果您想忽略大小写,请使用 CompareMethod.Text
。
您需要添加 using Microsoft.VisualBasic.CompilerServices;
和 add a reference to the Microsoft.VisualBasic.dll
。
因为它是 .NET 框架的一部分,而且永远都是,所以使用它不是问题 class。
通配符 *
可以翻译为 .*
或 .*?
正则表达式模式。
您可能需要使用单行模式来匹配换行符,在这种情况下,您可以使用 (?s)
作为正则表达式模式的一部分。
您可以为整个或部分模式设置:
X* = > @"X(?s:.*)"
*X = > @"(?s:.*)X"
*X* = > @"(?s).*X.*"
*X*YZ* = > @"(?s).*X.*YZ.*"
X*YZ*P = > @"(?s:X.*YZ.*P)"
通配符通常与两种类型的王牌一起运作:
? - any character (one and only one)
* - any characters (zero or more)
因此您可以轻松地将这些规则转换为适当的正则表达式n:
// If you want to implement both "*" and "?"
private static String WildCardToRegular(String value) {
return "^" + Regex.Escape(value).Replace("\?", ".").Replace("\*", ".*") + "$";
}
// If you want to implement "*" only
private static String WildCardToRegular(String value) {
return "^" + Regex.Escape(value).Replace("\*", ".*") + "$";
}
然后你可以像往常一样使用 Regex:
String test = "Some Data X";
Boolean endsWithEx = Regex.IsMatch(test, WildCardToRegular("*X"));
Boolean startsWithS = Regex.IsMatch(test, WildCardToRegular("S*"));
Boolean containsD = Regex.IsMatch(test, WildCardToRegular("*D*"));
// Starts with S, ends with X, contains "me" and "a" (in that order)
Boolean complex = Regex.IsMatch(test, WildCardToRegular("S*me*a*X"));
使用 System.Management.Automation
中的 WildcardPattern
可能是一种选择。
pattern = new WildcardPattern(patternString);
pattern.IsMatch(stringToMatch);
Visual Studio UI 可能不允许您将 System.Management.Automation
程序集添加到项目的引用中。随意手动添加它,如here.
所述
有必要考虑到,当检查与 Y* 的匹配时,Regex IsMatch 给出 XYZ 为真。为了避免它,我使用“^”锚
isMatch(str1, "^" + str2.Replace("*", ".*?"));
所以,解决您问题的完整代码是
bool isMatchStr(string str1, string str2)
{
string s1 = str1.Replace("*", ".*?");
string s2 = str2.Replace("*", ".*?");
bool r1 = Regex.IsMatch(s1, "^" + s2);
bool r2 = Regex.IsMatch(s2, "^" + s1);
return r1 || r2;
}
C# 控制台应用程序示例
Command line Sample:
C:/> App_Exe -Opy PythonFile.py 1 2 3
Console output:
Argument list: -Opy PythonFile.py 1 2 3
Found python filename: PythonFile.py
using System;
using System.Text.RegularExpressions; //Regex
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
string cmdLine = String.Join(" ", args);
bool bFileExtFlag = false;
int argIndex = 0;
Regex regex;
foreach (string s in args)
{
//Search for the 1st occurrence of the "*.py" pattern
regex = new Regex(@"(?s:.*)6py", RegexOptions.IgnoreCase);
bFileExtFlag = regex.IsMatch(s);
if (bFileExtFlag == true)
break;
argIndex++;
};
Console.WriteLine("Argument list: " + cmdLine);
if (bFileExtFlag == true)
Console.WriteLine("Found python filename: " + args[argIndex]);
else
Console.WriteLine("Python file with extension <.py> not found!");
}
}
}
public class Wildcard
{
private readonly string _pattern;
public Wildcard(string pattern)
{
_pattern = pattern;
}
public static bool Match(string value, string pattern)
{
int start = -1;
int end = -1;
return Match(value, pattern, ref start, ref end);
}
public static bool Match(string value, string pattern, char[] toLowerTable)
{
int start = -1;
int end = -1;
return Match(value, pattern, ref start, ref end, toLowerTable);
}
public static bool Match(string value, string pattern, ref int start, ref int end)
{
return new Wildcard(pattern).IsMatch(value, ref start, ref end);
}
public static bool Match(string value, string pattern, ref int start, ref int end, char[] toLowerTable)
{
return new Wildcard(pattern).IsMatch(value, ref start, ref end, toLowerTable);
}
public bool IsMatch(string str)
{
int start = -1;
int end = -1;
return IsMatch(str, ref start, ref end);
}
public bool IsMatch(string str, char[] toLowerTable)
{
int start = -1;
int end = -1;
return IsMatch(str, ref start, ref end, toLowerTable);
}
public bool IsMatch(string str, ref int start, ref int end)
{
if (_pattern.Length == 0) return false;
int pindex = 0;
int sindex = 0;
int pattern_len = _pattern.Length;
int str_len = str.Length;
start = -1;
while (true)
{
bool star = false;
if (_pattern[pindex] == '*')
{
star = true;
do
{
pindex++;
}
while (pindex < pattern_len && _pattern[pindex] == '*');
}
end = sindex;
int i;
while (true)
{
int si = 0;
bool breakLoops = false;
for (i = 0; pindex + i < pattern_len && _pattern[pindex + i] != '*'; i++)
{
si = sindex + i;
if (si == str_len)
{
return false;
}
if (str[si] == _pattern[pindex + i])
{
continue;
}
if (si == str_len)
{
return false;
}
if (_pattern[pindex + i] == '?' && str[si] != '.')
{
continue;
}
breakLoops = true;
break;
}
if (breakLoops)
{
if (!star)
{
return false;
}
sindex++;
if (si == str_len)
{
return false;
}
}
else
{
if (start == -1)
{
start = sindex;
}
if (pindex + i < pattern_len && _pattern[pindex + i] == '*')
{
break;
}
if (sindex + i == str_len)
{
if (end <= start)
{
end = str_len;
}
return true;
}
if (i != 0 && _pattern[pindex + i - 1] == '*')
{
return true;
}
if (!star)
{
return false;
}
sindex++;
}
}
sindex += i;
pindex += i;
if (start == -1)
{
start = sindex;
}
}
}
public bool IsMatch(string str, ref int start, ref int end, char[] toLowerTable)
{
if (_pattern.Length == 0) return false;
int pindex = 0;
int sindex = 0;
int pattern_len = _pattern.Length;
int str_len = str.Length;
start = -1;
while (true)
{
bool star = false;
if (_pattern[pindex] == '*')
{
star = true;
do
{
pindex++;
}
while (pindex < pattern_len && _pattern[pindex] == '*');
}
end = sindex;
int i;
while (true)
{
int si = 0;
bool breakLoops = false;
for (i = 0; pindex + i < pattern_len && _pattern[pindex + i] != '*'; i++)
{
si = sindex + i;
if (si == str_len)
{
return false;
}
char c = toLowerTable[str[si]];
if (c == _pattern[pindex + i])
{
continue;
}
if (si == str_len)
{
return false;
}
if (_pattern[pindex + i] == '?' && c != '.')
{
continue;
}
breakLoops = true;
break;
}
if (breakLoops)
{
if (!star)
{
return false;
}
sindex++;
if (si == str_len)
{
return false;
}
}
else
{
if (start == -1)
{
start = sindex;
}
if (pindex + i < pattern_len && _pattern[pindex + i] == '*')
{
break;
}
if (sindex + i == str_len)
{
if (end <= start)
{
end = str_len;
}
return true;
}
if (i != 0 && _pattern[pindex + i - 1] == '*')
{
return true;
}
if (!star)
{
return false;
}
sindex++;
continue;
}
}
sindex += i;
pindex += i;
if (start == -1)
{
start = sindex;
}
}
}
}
对于使用 .NET Core 2.1+ 或 .NET 5+ 的用户,您可以在 System.IO.Enumeration 命名空间中使用 FileSystemName.MatchesSimpleExpression 方法。
string text = "X is a string with ZY in the middle and at the end is P";
bool isMatch = FileSystemName.MatchesSimpleExpression("X*ZY*P", text);
两个参数实际上都是 ReadOnlySpan<char>
,但您也可以使用字符串参数。如果你想打开 on/off 大小写匹配,还有一个重载方法。正如 Chris 在评论中提到的,它默认不区分大小写。
支持那些带有 C#+Excel 的代码(对于部分已知的 WS 名称)但不仅如此 - 这是我的带有通配符 (ddd*) 的代码。
简而言之:代码获取所有 WS 名称,如果今天的工作日 (ddd) 与 WS 名称的前 3 个字母 (bool=true) 匹配,那么它会将其转换为从循环中提取的字符串。
using System;
using Microsoft.Office.Interop.Excel;
using System.Runtime.InteropServices;
using Range = Microsoft.Office.Interop.Excel.Range;
using System.Diagnostics;
using System.Reflection;
using System.IO;
using System.Text.RegularExpressions;
...
string weekDay = DateTime.Now.ToString("ddd*");
Workbook sourceWorkbook4 = xlApp.Workbooks.Open(LrsIdWorkbook, 0, false, 5, "", "", true, XlPlatform.xlWindows, "\t", false, false, 0, true, 1, 0);
Workbook destinationWorkbook = xlApp.Workbooks.Open(masterWB, 0, false, 5, "", "", true, XlPlatform.xlWindows, "\t", false, false, 0, true, 1, 0);
static String WildCardToRegular(String value)
{
return "^" + Regex.Escape(value).Replace("\*", ".*") + "$";
}
string wsName = null;
foreach (Worksheet works in sourceWorkbook4.Worksheets)
{
Boolean startsWithddd = Regex.IsMatch(works.Name, WildCardToRegular(weekDay + "*"));
if (startsWithddd == true)
{
wsName = works.Name.ToString();
}
}
Worksheet sourceWorksheet4 = (Worksheet)sourceWorkbook4.Worksheets.get_Item(wsName);
...
我想用通配符 (*) 匹配字符串,其中通配符表示 "any"。例如:
*X = string must end with X
X* = string must start with X
*X* = string must contain X
此外,还有一些复合用途,例如:
*X*YZ* = string contains X and contains YZ
X*YZ*P = string starts with X, contains YZ and ends with P.
有没有简单的算法可以做到这一点?我不确定是否使用正则表达式(尽管有可能)。
需要说明的是,用户会将上面的内容输入过滤框(尽可能简单的过滤器),我不希望他们必须自己编写正则表达式。所以我可以很容易地从上面的符号转换出来的东西会很好。
*X*YZ* = string contains X and contains YZ
@".*X.*YZ"
X*YZ*P = string starts with X, contains YZ and ends with P.
@"^X.*YZ.*P$"
您可以使用 VB.NET Like-Operator:
string text = "x is not the same as X and yz not the same as YZ";
bool contains = LikeOperator.LikeString(text,"*X*YZ*", Microsoft.VisualBasic.CompareMethod.Binary);
如果您想忽略大小写,请使用 CompareMethod.Text
。
您需要添加 using Microsoft.VisualBasic.CompilerServices;
和 add a reference to the Microsoft.VisualBasic.dll
。
因为它是 .NET 框架的一部分,而且永远都是,所以使用它不是问题 class。
通配符 *
可以翻译为 .*
或 .*?
正则表达式模式。
您可能需要使用单行模式来匹配换行符,在这种情况下,您可以使用 (?s)
作为正则表达式模式的一部分。
您可以为整个或部分模式设置:
X* = > @"X(?s:.*)"
*X = > @"(?s:.*)X"
*X* = > @"(?s).*X.*"
*X*YZ* = > @"(?s).*X.*YZ.*"
X*YZ*P = > @"(?s:X.*YZ.*P)"
通配符通常与两种类型的王牌一起运作:
? - any character (one and only one)
* - any characters (zero or more)
因此您可以轻松地将这些规则转换为适当的正则表达式n:
// If you want to implement both "*" and "?"
private static String WildCardToRegular(String value) {
return "^" + Regex.Escape(value).Replace("\?", ".").Replace("\*", ".*") + "$";
}
// If you want to implement "*" only
private static String WildCardToRegular(String value) {
return "^" + Regex.Escape(value).Replace("\*", ".*") + "$";
}
然后你可以像往常一样使用 Regex:
String test = "Some Data X";
Boolean endsWithEx = Regex.IsMatch(test, WildCardToRegular("*X"));
Boolean startsWithS = Regex.IsMatch(test, WildCardToRegular("S*"));
Boolean containsD = Regex.IsMatch(test, WildCardToRegular("*D*"));
// Starts with S, ends with X, contains "me" and "a" (in that order)
Boolean complex = Regex.IsMatch(test, WildCardToRegular("S*me*a*X"));
使用 System.Management.Automation
中的 WildcardPattern
可能是一种选择。
pattern = new WildcardPattern(patternString);
pattern.IsMatch(stringToMatch);
Visual Studio UI 可能不允许您将 System.Management.Automation
程序集添加到项目的引用中。随意手动添加它,如here.
有必要考虑到,当检查与 Y* 的匹配时,Regex IsMatch 给出 XYZ 为真。为了避免它,我使用“^”锚
isMatch(str1, "^" + str2.Replace("*", ".*?"));
所以,解决您问题的完整代码是
bool isMatchStr(string str1, string str2)
{
string s1 = str1.Replace("*", ".*?");
string s2 = str2.Replace("*", ".*?");
bool r1 = Regex.IsMatch(s1, "^" + s2);
bool r2 = Regex.IsMatch(s2, "^" + s1);
return r1 || r2;
}
C# 控制台应用程序示例
Command line Sample:
C:/> App_Exe -Opy PythonFile.py 1 2 3
Console output:
Argument list: -Opy PythonFile.py 1 2 3
Found python filename: PythonFile.py
using System;
using System.Text.RegularExpressions; //Regex
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
string cmdLine = String.Join(" ", args);
bool bFileExtFlag = false;
int argIndex = 0;
Regex regex;
foreach (string s in args)
{
//Search for the 1st occurrence of the "*.py" pattern
regex = new Regex(@"(?s:.*)6py", RegexOptions.IgnoreCase);
bFileExtFlag = regex.IsMatch(s);
if (bFileExtFlag == true)
break;
argIndex++;
};
Console.WriteLine("Argument list: " + cmdLine);
if (bFileExtFlag == true)
Console.WriteLine("Found python filename: " + args[argIndex]);
else
Console.WriteLine("Python file with extension <.py> not found!");
}
}
}
public class Wildcard
{
private readonly string _pattern;
public Wildcard(string pattern)
{
_pattern = pattern;
}
public static bool Match(string value, string pattern)
{
int start = -1;
int end = -1;
return Match(value, pattern, ref start, ref end);
}
public static bool Match(string value, string pattern, char[] toLowerTable)
{
int start = -1;
int end = -1;
return Match(value, pattern, ref start, ref end, toLowerTable);
}
public static bool Match(string value, string pattern, ref int start, ref int end)
{
return new Wildcard(pattern).IsMatch(value, ref start, ref end);
}
public static bool Match(string value, string pattern, ref int start, ref int end, char[] toLowerTable)
{
return new Wildcard(pattern).IsMatch(value, ref start, ref end, toLowerTable);
}
public bool IsMatch(string str)
{
int start = -1;
int end = -1;
return IsMatch(str, ref start, ref end);
}
public bool IsMatch(string str, char[] toLowerTable)
{
int start = -1;
int end = -1;
return IsMatch(str, ref start, ref end, toLowerTable);
}
public bool IsMatch(string str, ref int start, ref int end)
{
if (_pattern.Length == 0) return false;
int pindex = 0;
int sindex = 0;
int pattern_len = _pattern.Length;
int str_len = str.Length;
start = -1;
while (true)
{
bool star = false;
if (_pattern[pindex] == '*')
{
star = true;
do
{
pindex++;
}
while (pindex < pattern_len && _pattern[pindex] == '*');
}
end = sindex;
int i;
while (true)
{
int si = 0;
bool breakLoops = false;
for (i = 0; pindex + i < pattern_len && _pattern[pindex + i] != '*'; i++)
{
si = sindex + i;
if (si == str_len)
{
return false;
}
if (str[si] == _pattern[pindex + i])
{
continue;
}
if (si == str_len)
{
return false;
}
if (_pattern[pindex + i] == '?' && str[si] != '.')
{
continue;
}
breakLoops = true;
break;
}
if (breakLoops)
{
if (!star)
{
return false;
}
sindex++;
if (si == str_len)
{
return false;
}
}
else
{
if (start == -1)
{
start = sindex;
}
if (pindex + i < pattern_len && _pattern[pindex + i] == '*')
{
break;
}
if (sindex + i == str_len)
{
if (end <= start)
{
end = str_len;
}
return true;
}
if (i != 0 && _pattern[pindex + i - 1] == '*')
{
return true;
}
if (!star)
{
return false;
}
sindex++;
}
}
sindex += i;
pindex += i;
if (start == -1)
{
start = sindex;
}
}
}
public bool IsMatch(string str, ref int start, ref int end, char[] toLowerTable)
{
if (_pattern.Length == 0) return false;
int pindex = 0;
int sindex = 0;
int pattern_len = _pattern.Length;
int str_len = str.Length;
start = -1;
while (true)
{
bool star = false;
if (_pattern[pindex] == '*')
{
star = true;
do
{
pindex++;
}
while (pindex < pattern_len && _pattern[pindex] == '*');
}
end = sindex;
int i;
while (true)
{
int si = 0;
bool breakLoops = false;
for (i = 0; pindex + i < pattern_len && _pattern[pindex + i] != '*'; i++)
{
si = sindex + i;
if (si == str_len)
{
return false;
}
char c = toLowerTable[str[si]];
if (c == _pattern[pindex + i])
{
continue;
}
if (si == str_len)
{
return false;
}
if (_pattern[pindex + i] == '?' && c != '.')
{
continue;
}
breakLoops = true;
break;
}
if (breakLoops)
{
if (!star)
{
return false;
}
sindex++;
if (si == str_len)
{
return false;
}
}
else
{
if (start == -1)
{
start = sindex;
}
if (pindex + i < pattern_len && _pattern[pindex + i] == '*')
{
break;
}
if (sindex + i == str_len)
{
if (end <= start)
{
end = str_len;
}
return true;
}
if (i != 0 && _pattern[pindex + i - 1] == '*')
{
return true;
}
if (!star)
{
return false;
}
sindex++;
continue;
}
}
sindex += i;
pindex += i;
if (start == -1)
{
start = sindex;
}
}
}
}
对于使用 .NET Core 2.1+ 或 .NET 5+ 的用户,您可以在 System.IO.Enumeration 命名空间中使用 FileSystemName.MatchesSimpleExpression 方法。
string text = "X is a string with ZY in the middle and at the end is P";
bool isMatch = FileSystemName.MatchesSimpleExpression("X*ZY*P", text);
两个参数实际上都是 ReadOnlySpan<char>
,但您也可以使用字符串参数。如果你想打开 on/off 大小写匹配,还有一个重载方法。正如 Chris 在评论中提到的,它默认不区分大小写。
支持那些带有 C#+Excel 的代码(对于部分已知的 WS 名称)但不仅如此 - 这是我的带有通配符 (ddd*) 的代码。 简而言之:代码获取所有 WS 名称,如果今天的工作日 (ddd) 与 WS 名称的前 3 个字母 (bool=true) 匹配,那么它会将其转换为从循环中提取的字符串。
using System;
using Microsoft.Office.Interop.Excel;
using System.Runtime.InteropServices;
using Range = Microsoft.Office.Interop.Excel.Range;
using System.Diagnostics;
using System.Reflection;
using System.IO;
using System.Text.RegularExpressions;
...
string weekDay = DateTime.Now.ToString("ddd*");
Workbook sourceWorkbook4 = xlApp.Workbooks.Open(LrsIdWorkbook, 0, false, 5, "", "", true, XlPlatform.xlWindows, "\t", false, false, 0, true, 1, 0);
Workbook destinationWorkbook = xlApp.Workbooks.Open(masterWB, 0, false, 5, "", "", true, XlPlatform.xlWindows, "\t", false, false, 0, true, 1, 0);
static String WildCardToRegular(String value)
{
return "^" + Regex.Escape(value).Replace("\*", ".*") + "$";
}
string wsName = null;
foreach (Worksheet works in sourceWorkbook4.Worksheets)
{
Boolean startsWithddd = Regex.IsMatch(works.Name, WildCardToRegular(weekDay + "*"));
if (startsWithddd == true)
{
wsName = works.Name.ToString();
}
}
Worksheet sourceWorksheet4 = (Worksheet)sourceWorkbook4.Worksheets.get_Item(wsName);
...