用于匹配日期的正则表达式(月日、年或 m/d/yy)
RegEx for matching dates (Month Day, Year OR m/d/yy)
我正在尝试编写一个正则表达式,该表达式可用于在字符串中查找日期,该字符串可能前面(或后面)有空格、数字、文本、行尾等。表达式应该处理美国日期格式
1) 月名日、年 - 即 2019 年 1 月 10 日或
2) mm/dd/yy - 即 11/30/19
我为月份名称、日期年份找到了这个
(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}
(在此感谢 Veverke
和 mm/dd/yy(以及 m/d/y 的各种组合)
(1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])/(?:[0-9]{2})?[0-9]{2}
(在此感谢 Steven Levithan 和 Jan Goyvaerts https://www.oreilly.com/library/view/regular-expressions-cookbook/9781449327453/ch04s04.html
我试过这样组合它们
((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4})|((1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])/(?:[0-9]{2})?[0-9]{2})
当我在输入字符串 "Paid on 1/1/2019" 中搜索 "on [regex above]" 时,它确实找到了日期,但没有找到单词 "on"。如果我只使用
就会找到该字符串
(1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])/(?:[0-9]{2})?[0-9]{2}
谁能看出我做错了什么?
编辑
我正在使用下面的 c# .net 代码:
string stringToSearch = "Paid on 1/1/2019";
string searchPattern = @"on ((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4})|((1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])/(?:[0-9]{2})?[0-9]{2})";
var match = Regex.Match(stringToSearch, searchPattern, RegexOptions.IgnoreCase);
string foundString;
if (match.Success)
foundString= stringToSearch.Substring(match.Index, match.Length);
例如
string searchPattern = @"on ((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4})|((1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])/(?:[0-9]{2})?[0-9]{2})";
stringToSearch = "Paid on Jan 1, 2019";
found = "on Jan 1, 2019" -- worked as expected, found the word "on" and the date
stringToSearch = "Paid on 1/1/2019";
found = "1/1/2019" -- did not work as expected, found the date but did not include the word "on"
如果我反转模式
string searchPattern = @"on ((1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])/(?:[0-9]{2})?[0-9]{2})|((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4})"";
stringToSearch = "Paid on Jan 1, 2019";
found = "Jan 1, 2019" -- did not work as expected, found the date but did not include the word "on"
stringToSearch = "Paid on 1/1/2019";
found = "on 1/1/2019" -- worked as expected, found the word "on" and the date
谢谢
你的表达似乎很好,他们两个。如果您希望在目标输出之前或之后捕获任何内容,您只需在左右添加两个边界即可,这将为您完成。例如,请看this test:
(.*)(((1[0-2]|0?[1-9])\/(3[01]|[12][0-9]|0?[1-9])\/(?:[0-9]{2})?[0-9]{2})|((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}))(.*)
例如,您可以在其中添加类似于 (.*)
的两个组,并将您的原始表达式包装在一个组中,这样就可以了。
正则表达式描述图
该图可视化您的表达式如何工作,您可能想测试此 link 中的其他表达式:
C# 测试
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = @"(.*)(((1[0-2]|0?[1-9])\/(3[01]|[12][0-9]|0?[1-9])\/(?:[0-9]{2})?[0-9]{2})|((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}))(.*)";
string input = @"Paid on Jan 1, 2019 And anything else that you wish to have after
Paid on 1/1/2019 And anything else that you wish to have after";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}
JavaScript演示
此 JavaScript 演示表明您的表达式有效:
const regex = /(.*)(((1[0-2]|0?[1-9])\/(3[01]|[12][0-9]|0?[1-9])\/(?:[0-9]{2})?[0-9]{2})|((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}))(.*)/gm;
const str = `Paid on Jan 1, 2019 And anything else that you wish to have after
Paid on 1/1/2019 And anything else that you wish to have after`;
const subst = `\nGroup 1: \nGroup 2: \nGroup 3: \nGroup 4: `;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: ', result);
基本性能测试
此 JavaScript 片段 returns 运行一百万次 for
循环以提高性能。
const repeat = 1000000;
const start = Date.now();
for (var i = repeat; i >= 0; i--) {
const string = 'Paid on Jan 1, 2019';
const regex = /(.*)(((1[0-2]|0?[1-9])\/(3[01]|[12][0-9]|0?[1-9])\/(?:[0-9]{2})?[0-9]{2})|((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}))(.*)/gm;
var match = string.replace(regex, "\nGroup #1: \nGroup #2: \n");
}
const end = Date.now() - start;
console.log("YAAAY! \"" + match + "\" is a match ");
console.log(end / 1000 + " is the runtime of " + repeat + " times benchmark test. ");
改进
您可能希望减少围绕月份名称的捕获组,如果愿意,您可以将所有这些简单地添加到一个捕获组中。
我正在尝试编写一个正则表达式,该表达式可用于在字符串中查找日期,该字符串可能前面(或后面)有空格、数字、文本、行尾等。表达式应该处理美国日期格式
1) 月名日、年 - 即 2019 年 1 月 10 日或
2) mm/dd/yy - 即 11/30/19
我为月份名称、日期年份找到了这个
(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}
(在此感谢 Veverke
和 mm/dd/yy(以及 m/d/y 的各种组合)
(1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])/(?:[0-9]{2})?[0-9]{2}
(在此感谢 Steven Levithan 和 Jan Goyvaerts https://www.oreilly.com/library/view/regular-expressions-cookbook/9781449327453/ch04s04.html
我试过这样组合它们
((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4})|((1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])/(?:[0-9]{2})?[0-9]{2})
当我在输入字符串 "Paid on 1/1/2019" 中搜索 "on [regex above]" 时,它确实找到了日期,但没有找到单词 "on"。如果我只使用
就会找到该字符串(1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])/(?:[0-9]{2})?[0-9]{2}
谁能看出我做错了什么?
编辑
我正在使用下面的 c# .net 代码:
string stringToSearch = "Paid on 1/1/2019";
string searchPattern = @"on ((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4})|((1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])/(?:[0-9]{2})?[0-9]{2})";
var match = Regex.Match(stringToSearch, searchPattern, RegexOptions.IgnoreCase);
string foundString;
if (match.Success)
foundString= stringToSearch.Substring(match.Index, match.Length);
例如
string searchPattern = @"on ((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4})|((1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])/(?:[0-9]{2})?[0-9]{2})";
stringToSearch = "Paid on Jan 1, 2019";
found = "on Jan 1, 2019" -- worked as expected, found the word "on" and the date
stringToSearch = "Paid on 1/1/2019";
found = "1/1/2019" -- did not work as expected, found the date but did not include the word "on"
如果我反转模式
string searchPattern = @"on ((1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])/(?:[0-9]{2})?[0-9]{2})|((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4})"";
stringToSearch = "Paid on Jan 1, 2019";
found = "Jan 1, 2019" -- did not work as expected, found the date but did not include the word "on"
stringToSearch = "Paid on 1/1/2019";
found = "on 1/1/2019" -- worked as expected, found the word "on" and the date
谢谢
你的表达似乎很好,他们两个。如果您希望在目标输出之前或之后捕获任何内容,您只需在左右添加两个边界即可,这将为您完成。例如,请看this test:
(.*)(((1[0-2]|0?[1-9])\/(3[01]|[12][0-9]|0?[1-9])\/(?:[0-9]{2})?[0-9]{2})|((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}))(.*)
例如,您可以在其中添加类似于 (.*)
的两个组,并将您的原始表达式包装在一个组中,这样就可以了。
正则表达式描述图
该图可视化您的表达式如何工作,您可能想测试此 link 中的其他表达式:
C# 测试
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = @"(.*)(((1[0-2]|0?[1-9])\/(3[01]|[12][0-9]|0?[1-9])\/(?:[0-9]{2})?[0-9]{2})|((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}))(.*)";
string input = @"Paid on Jan 1, 2019 And anything else that you wish to have after
Paid on 1/1/2019 And anything else that you wish to have after";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}
JavaScript演示
此 JavaScript 演示表明您的表达式有效:
const regex = /(.*)(((1[0-2]|0?[1-9])\/(3[01]|[12][0-9]|0?[1-9])\/(?:[0-9]{2})?[0-9]{2})|((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}))(.*)/gm;
const str = `Paid on Jan 1, 2019 And anything else that you wish to have after
Paid on 1/1/2019 And anything else that you wish to have after`;
const subst = `\nGroup 1: \nGroup 2: \nGroup 3: \nGroup 4: `;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: ', result);
基本性能测试
此 JavaScript 片段 returns 运行一百万次 for
循环以提高性能。
const repeat = 1000000;
const start = Date.now();
for (var i = repeat; i >= 0; i--) {
const string = 'Paid on Jan 1, 2019';
const regex = /(.*)(((1[0-2]|0?[1-9])\/(3[01]|[12][0-9]|0?[1-9])\/(?:[0-9]{2})?[0-9]{2})|((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}))(.*)/gm;
var match = string.replace(regex, "\nGroup #1: \nGroup #2: \n");
}
const end = Date.now() - start;
console.log("YAAAY! \"" + match + "\" is a match ");
console.log(end / 1000 + " is the runtime of " + repeat + " times benchmark test. ");
改进
您可能希望减少围绕月份名称的捕获组,如果愿意,您可以将所有这些简单地添加到一个捕获组中。