正则表达式从字符串中过滤掉日期时间
Regex to filter out Datetime time from string
要求:我收到一封带有模板的电子邮件,我需要从电子邮件中过滤掉一些文本。我正在将所有电子邮件正文文本转换为字符串。
电子邮件正文如下所示:
some body text which I don't need
Discussion:
Tue 26/04/2022/2:48 PM UTC+10/ ABC User-
TEST description - this should be logged as a comment. --- This is the part I need
ABC Company Australia | XYZ St | Sydney NSW 2000
我正在考虑使用正则表达式来查找
- 一个词“讨论”
- 下一行查找日期时间格式为“Tue 26/04/2022/2:48 PM UTC+10/ ABC User-”
- 拿起下一行,直到我们找到这一行 - “ABC Company Australia | XYZ St | Sydney NSW 2000”地址
可能吗?有人可以帮助正则表达式吗?
TIA。
您可以试试这个正则表达式:
Discussion.*?\n+([A-Za-z]+ +(?:\d{2}\/){2}\d{4}\/\d+:\d+ +[^\n]+)(.*)?ABC Company Australia \| XYZ St \| Sydney NSW 2000
解释:
Discussion.*?\n+
正则表达式从字符串 Discussion
开始的地方开始。
.*?\n+
一直在寻找额外的单词和换行符
([A-Za-z]+ +(?:\d{2}\/){2}\d{4}\/\d+:\d+ +[^\n]+)
接下来它会按照您描述的那样查找日期格式。它将获取所有内容,直到到达换行符 [^\n+]
(.*)?
它将获取上一个日期行的所有内容
ABC Company Australia \| XYZ St \| Sydney NSW 2000
并且会
找到就结束匹配。
- 这里我保留了组 1 中的日期格式行和正文
第 2 组需要
来源:
const regex = /Discussion.*?\n+([A-Za-z]+ +(?:\d{2}\/){2}\d{4}\/\d+:\d+ +[^\n]+)(.*)?ABC Company Australia \| XYZ St \| Sydney NSW 2000/gms;
const str = `some body text which I don't need
Discussion:
Tue 26/04/2022/2:48 PM UTC+10/ ABC User-
TEST description - this should be logged as a comment. --- This is the part I need
ABC Company Australia | XYZ St | Sydney NSW 2000
`;
var match = regex.exec(str);
if(match!=null){
console.log(match[1]);
console.log(match[2]);
}
如果只是关于 OP 感兴趣的内容,下面的正则表达式就足够了...... /Discussion:\n[a-zA-Z]{1,3}\s+\d{2}\/\d{2}\/\d{4}.*\n+(?<content>.*)/
const multilineMail = `Discussion:
Tue 26/04/2022/2:48 PM UTC+10/ ABC User-
TEST description - this should be logged as a comment. --- This is the part I need
ABC Company Australia | XYZ St | Sydney NSW 2000`;
// see ... [https://regex101.com/r/v8FXCA/3]
const regXMailContent =
/Discussion:\n[a-zA-Z]{1,3}\s+\d{2}\/\d{2}\/\d{4}.*\n+(?<content>.*)/;
console.log(
regXMailContent.exec(multilineMail)?.groups?.content
);
如果公司页脚必须完全匹配,则必须使其成为上述正则表达式的一部分,如下所示... /Discussion:\n[a-zA-Z]{1,3}\s+\d{2}\/\d{2}\/\d{4}.*\n+(?<content>.*)\n+ABC Company Australia \| XYZ St \| Sydney NSW 2000/
const multilineMail = `Discussion:
Tue 26/04/2022/2:48 PM UTC+10/ ABC User-
TEST description - this should be logged as a comment. --- This is the part I need
ABC Company Australia | XYZ St | Sydney NSW 2000`;
// see ... [https://regex101.com/r/v8FXCA/4]
const regXMailContent =
/Discussion:\n[a-zA-Z]{1,3}\s+\d{2}\/\d{2}\/\d{4}.*\n+(?<content>.*)\n+ABC Company Australia \| XYZ St \| Sydney NSW 2000/;
console.log(
regXMailContent.exec(multilineMail)?.groups?.content
);
如果 OP 还想保存日期和用户,可以增强第一个提供的正则表达式,例如...
/Discussion:\n(?<date>[a-zA-Z]{1,3}\s+\d{2}\/\d{2}\/\d{4}).*\n+(?<content>.*)/
/Discussion:\n(?<date>[a-zA-Z]{1,3}\s+\d{2}\/\d{2}\/\d{4}\/[^/]+)\/\s*(?<user>.*?)-?\s*\n+(?<content>.*)/
const multilineMail = `Discussion:
Tue 26/04/2022/2:48 PM UTC+10/ ABC User-
TEST description - this should be logged as a comment. --- This is the part I need
ABC Company Australia | XYZ St | Sydney NSW 2000`;
// see ... [https://regex101.com/r/v8FXCA/2]
const regXMailDateAndContent =
/Discussion:\n(?<date>[a-zA-Z]{1,3}\s+\d{2}\/\d{2}\/\d{4}).*\n+(?<content>.*)/;
// see ... [https://regex101.com/r/v8FXCA/1]
const regXMailDateUserAndContent =
/Discussion:\n(?<date>[a-zA-Z]{1,3}\s+\d{2}\/\d{2}\/\d{4}\/[^/]+)\/\s*(?<user>.*?)-?\s*\n+(?<content>.*)/;
console.log(
regXMailDateAndContent.exec(multilineMail)?.groups
);
console.log(
regXMailDateUserAndContent.exec(multilineMail)?.groups
);
.as-console-wrapper { min-height: 100%!important; top: 0; }
但如果要提取的内容是多行文本,正则表达式必须以公司为特征footer 以识别正确的匹配项。然后第二个提供的正则表达式变为 ... /Discussion:\n[a-zA-Z]{1,3}\s+\d{2}\/\d{2}\/\d{4}.*\n+(?<content>(?:.*\n)*)ABC Company Australia \| XYZ St \| Sydney NSW 2000/
const multilineMail = `Discussion:
Tue 26/04/2022/2:48 PM UTC+10/ ABC User-
TEST
description - this should be
logged as a comment. --- This is
the part I need
ABC Company Australia | XYZ St | Sydney NSW 2000`;
// see ... [https://regex101.com/r/v8FXCA/5]
const regXMailMultilineContent =
/Discussion:\n[a-zA-Z]{1,3}\s+\d{2}\/\d{2}\/\d{4}.*\n+(?<content>(?:.*\n)*)ABC Company Australia \| XYZ St \| Sydney NSW 2000/;
console.log(
regXMailMultilineContent.exec(multilineMail)?.groups?.content
);
.as-console-wrapper { min-height: 100%!important; top: 0; }
以上所有正则表达式模式都使用了 named capturing groups
.
要求:我收到一封带有模板的电子邮件,我需要从电子邮件中过滤掉一些文本。我正在将所有电子邮件正文文本转换为字符串。
电子邮件正文如下所示:
some body text which I don't need
Discussion:
Tue 26/04/2022/2:48 PM UTC+10/ ABC User-
TEST description - this should be logged as a comment. --- This is the part I need
ABC Company Australia | XYZ St | Sydney NSW 2000
我正在考虑使用正则表达式来查找
- 一个词“讨论”
- 下一行查找日期时间格式为“Tue 26/04/2022/2:48 PM UTC+10/ ABC User-”
- 拿起下一行,直到我们找到这一行 - “ABC Company Australia | XYZ St | Sydney NSW 2000”地址
可能吗?有人可以帮助正则表达式吗?
TIA。
您可以试试这个正则表达式:
Discussion.*?\n+([A-Za-z]+ +(?:\d{2}\/){2}\d{4}\/\d+:\d+ +[^\n]+)(.*)?ABC Company Australia \| XYZ St \| Sydney NSW 2000
解释:
Discussion.*?\n+
正则表达式从字符串Discussion
开始的地方开始。.*?\n+
一直在寻找额外的单词和换行符([A-Za-z]+ +(?:\d{2}\/){2}\d{4}\/\d+:\d+ +[^\n]+)
接下来它会按照您描述的那样查找日期格式。它将获取所有内容,直到到达换行符[^\n+]
(.*)?
它将获取上一个日期行的所有内容ABC Company Australia \| XYZ St \| Sydney NSW 2000
并且会 找到就结束匹配。- 这里我保留了组 1 中的日期格式行和正文 第 2 组需要
来源:
const regex = /Discussion.*?\n+([A-Za-z]+ +(?:\d{2}\/){2}\d{4}\/\d+:\d+ +[^\n]+)(.*)?ABC Company Australia \| XYZ St \| Sydney NSW 2000/gms;
const str = `some body text which I don't need
Discussion:
Tue 26/04/2022/2:48 PM UTC+10/ ABC User-
TEST description - this should be logged as a comment. --- This is the part I need
ABC Company Australia | XYZ St | Sydney NSW 2000
`;
var match = regex.exec(str);
if(match!=null){
console.log(match[1]);
console.log(match[2]);
}
如果只是关于 OP 感兴趣的内容,下面的正则表达式就足够了...... /Discussion:\n[a-zA-Z]{1,3}\s+\d{2}\/\d{2}\/\d{4}.*\n+(?<content>.*)/
const multilineMail = `Discussion:
Tue 26/04/2022/2:48 PM UTC+10/ ABC User-
TEST description - this should be logged as a comment. --- This is the part I need
ABC Company Australia | XYZ St | Sydney NSW 2000`;
// see ... [https://regex101.com/r/v8FXCA/3]
const regXMailContent =
/Discussion:\n[a-zA-Z]{1,3}\s+\d{2}\/\d{2}\/\d{4}.*\n+(?<content>.*)/;
console.log(
regXMailContent.exec(multilineMail)?.groups?.content
);
如果公司页脚必须完全匹配,则必须使其成为上述正则表达式的一部分,如下所示... /Discussion:\n[a-zA-Z]{1,3}\s+\d{2}\/\d{2}\/\d{4}.*\n+(?<content>.*)\n+ABC Company Australia \| XYZ St \| Sydney NSW 2000/
const multilineMail = `Discussion:
Tue 26/04/2022/2:48 PM UTC+10/ ABC User-
TEST description - this should be logged as a comment. --- This is the part I need
ABC Company Australia | XYZ St | Sydney NSW 2000`;
// see ... [https://regex101.com/r/v8FXCA/4]
const regXMailContent =
/Discussion:\n[a-zA-Z]{1,3}\s+\d{2}\/\d{2}\/\d{4}.*\n+(?<content>.*)\n+ABC Company Australia \| XYZ St \| Sydney NSW 2000/;
console.log(
regXMailContent.exec(multilineMail)?.groups?.content
);
如果 OP 还想保存日期和用户,可以增强第一个提供的正则表达式,例如...
/Discussion:\n(?<date>[a-zA-Z]{1,3}\s+\d{2}\/\d{2}\/\d{4}).*\n+(?<content>.*)/
/Discussion:\n(?<date>[a-zA-Z]{1,3}\s+\d{2}\/\d{2}\/\d{4}\/[^/]+)\/\s*(?<user>.*?)-?\s*\n+(?<content>.*)/
const multilineMail = `Discussion:
Tue 26/04/2022/2:48 PM UTC+10/ ABC User-
TEST description - this should be logged as a comment. --- This is the part I need
ABC Company Australia | XYZ St | Sydney NSW 2000`;
// see ... [https://regex101.com/r/v8FXCA/2]
const regXMailDateAndContent =
/Discussion:\n(?<date>[a-zA-Z]{1,3}\s+\d{2}\/\d{2}\/\d{4}).*\n+(?<content>.*)/;
// see ... [https://regex101.com/r/v8FXCA/1]
const regXMailDateUserAndContent =
/Discussion:\n(?<date>[a-zA-Z]{1,3}\s+\d{2}\/\d{2}\/\d{4}\/[^/]+)\/\s*(?<user>.*?)-?\s*\n+(?<content>.*)/;
console.log(
regXMailDateAndContent.exec(multilineMail)?.groups
);
console.log(
regXMailDateUserAndContent.exec(multilineMail)?.groups
);
.as-console-wrapper { min-height: 100%!important; top: 0; }
但如果要提取的内容是多行文本,正则表达式必须以公司为特征footer 以识别正确的匹配项。然后第二个提供的正则表达式变为 ... /Discussion:\n[a-zA-Z]{1,3}\s+\d{2}\/\d{2}\/\d{4}.*\n+(?<content>(?:.*\n)*)ABC Company Australia \| XYZ St \| Sydney NSW 2000/
const multilineMail = `Discussion:
Tue 26/04/2022/2:48 PM UTC+10/ ABC User-
TEST
description - this should be
logged as a comment. --- This is
the part I need
ABC Company Australia | XYZ St | Sydney NSW 2000`;
// see ... [https://regex101.com/r/v8FXCA/5]
const regXMailMultilineContent =
/Discussion:\n[a-zA-Z]{1,3}\s+\d{2}\/\d{2}\/\d{4}.*\n+(?<content>(?:.*\n)*)ABC Company Australia \| XYZ St \| Sydney NSW 2000/;
console.log(
regXMailMultilineContent.exec(multilineMail)?.groups?.content
);
.as-console-wrapper { min-height: 100%!important; top: 0; }
以上所有正则表达式模式都使用了 named capturing groups
.