如何在 GAS 中使用正则表达式将匹配限制为电子邮件正文的前 5 行

How to restrict matches to the first 5 lines of an email body using regex in GAS

我正在使用以下脚本,它可以正常工作以从电子邮件正文中提取 2 个字段。

由于正文中的内容量,这导致脚本执行时间显着增加。有没有办法只通过电子邮件正文的前 5 行进行搜索?

电子邮件的第一行:

Name: Full Report
Store: River North (Wells St)
Date Tripped: 19 Feb 2020 1:07 PM
Business Date: 19 Feb 2020 (Open)
Message:
Information:
This alert was tripped based on a user defined trigger: Every 15 minutes.

脚本:

//gets first(latest) message with set label
var threads = GmailApp.getUserLabelByName('South Loop').getThreads(0,1);
if (threads && threads.length > 0) {
  var message = threads[0].getMessages()[0];
  // Get the first email message of a threads
  var tmp,
    subject = message.getSubject(),
    content = message.getPlainBody();
  // Get the plain text body of the email message
  // You may also use getRawContent() for parsing HTML

  // Implement Parsing rules using regular expressions
  if (content) {

    tmp = content.match(/Date Tripped:\s*([:\w\s]+)\r?\n/);
    var tripped = (tmp && tmp[1]) ? tmp[1].trim() : 'N/A';

    tmp = content.match(/Business Date:\s([\w\s]+\(\w+\))/);
    var businessdate = (tmp && tmp[1]) ? tmp[1].trim() : 'N/A';
  }
}

您可以使用模式 /^(?:.*\r?\n){0,5}/ 来获取电子邮件的前 5 行,然后 运行 您的搜索针对这个较小的字符串。这是一个带有硬编码 content 的浏览器示例,但我在 Google Apps 脚本中对其进行了测试。

const Logger = console; // Remove this for GAS!

const content = `Name: Full Report
Store: River North (Wells St)
Date Tripped: 19 Feb 2020 1:07 PM
Business Date: 19 Feb 2020 (Open)
Message:
Information:
This alert was tripped based on a user defined trigger: Every 15 minutes.`;

const searchPattern = /(Date Tripped|Business Date): *(.+?)\r?\n/g;
const matches = [...content.match(/^(?:.*\r?\n){0,5}/)[0]
                           .matchAll(searchPattern)]

const result = Object.fromEntries(matches.map(e => e.slice(1)));
Logger.log(result);

如果您希望动态插入搜索词,请使用:

const Logger = console; // Remove this for GAS!

const content = `Name: Full Report
Store: River North (Wells St)
Date Tripped: 19 Feb 2020 1:07 PM
Business Date: 19 Feb 2020 (Open)
Foo: this will match because it's on line 5
Bar: this won't match because it's on line 6
Information:
`;

const searchTerms = ["Date Tripped", "Business Date", "Foo", "Bar"];
const searchPattern = new RegExp(`(${searchTerms.join("|")}): *(.+?)\r?\n`, "g");
const matches = [...content.match(/^(?:.*\r?\n){0,5}/)[0]
                           .matchAll(searchPattern)]

const result = Object.fromEntries(matches.map(e => e.slice(1)));
Logger.log(result);

ES5 版本,如果您使用的是旧引擎:

var Logger = console; // Remove this for GAS!

var content = "Name: Full Report\nStore: River North (Wells St)\nDate Tripped: 19 Feb 2020 1:07 PM\nBusiness Date: 19 Feb 2020 (Open)\nMessage:\nInformation:\nThis alert was tripped based on a user defined trigger: Every 15 minutes.\n";

var searchPattern = /(Date Tripped|Business Date): *(.+?)\r?\n/g;
var truncatedContent = content.match(/^(?:.*\r?\n){0,5}/)[0];
var result = {};

for (var m; m = searchPattern.exec(content); result[m[1]] = m[2]);

Logger.log(result);

@ggorlen 的回答并不准确,依我的口味。让我们来看看regex01

我对 (?:.*\r?\n){0,5} 的问题是:这个正则表达式用英语表示:

Take any number of characters (0 or more) ending with a newline. 
Do this between 0 and 5 times.

表示匹配任何个空字符串。如果您要进行全局匹配,则有很多。

那么,您如何抓住前 5 行呢? 要准确!所以像

^([^\r\n]+\r?\n){5}

regex101

P.S. @ggorlen 提到我在 regex101 中保留了默认的多行匹配,他是对的。您的偏好可能会有所不同:在忽略少于 5 行的消息和接受空行字符串之间进行选择取决于您的具体情况。

P.S.2 我调整了我的措辞并禁用了 regex101 中的多行和全局设置以显示我对此的担忧。