RegEx 解析中间的 MIME 消息体。如何?
RegEx to parse MIME Message bodies in between. How?
我正在编写一个 IMAP4 备份应用程序。经过大量研究,我找到了 return 所有消息或一系列消息的正确 IMAP 命令。
SS01 UID FETCH 1:* BODY[]
这个漂亮的命令 returns 数据格式如下:
* 1 FETCH (UID 2 BODY[] {7765}
data to be extracted
from here! which can possibly contain
) <--- one or more prior to its final...
)
* 2 FETCH (UID 3 BODY[] {443}
data to be extracted
from here! which can possibly contain
) <--- one or more prior to its final...
)
* 3 FETCH (UID 4 BODY[] {4432}
data to be extracted
from here! which can possibly contain
) <--- one or more prior to its final...
)
* 4 FETCH (UID 5 BODY[] {123}
data to be extracted
from here! which can possibly contain
) <--- one or more prior to its final...
)
SS01 OK Success
我能在本文中找到的唯一独特模式是:
第一条消息以...
开头
1 FETCH (UID 2 BODY[] {7765}
每条不是最后一条的消息都以....
结尾
)
* 2 FETCH (UID 3 BODY[] {443}
最后一条消息以...
结尾
)
SS01 OK Success
我在网站上找到了以下示例,我正在尝试实施但没有成功。
RegEx 模式是:
(?<=This is)(.*)(?=sentence)
这是一个不起作用的最小可重现示例。
(\*\s\d+\s\w+\s\(UID\s\d+\sBODY\[\]\s\{\d+\})(.*\n)(\)\n\*\s\d+\s\w+\s\(UID\s\d+\sBODY\[\]\s\{\d+\})
您可以像这样极大地简化您的正则表达式:
\{\d+\}$[\r\n]+([\s\S]+?)^\)$
\{\d+\}$
- 在行尾找到 {digits}
[\r\n]+
- 捕获任何新行
([\s\S]+?)
- 松散地捕获导致以下内容的所需文本:(阅读以下要点)
^\)$
- 找到只有右括号 )
的行
您想要的文本将在捕获组 #1 中
https://regex101.com/r/A86eEv/1/
var regex = /\{\d+\}$[\r\n]+([\s\S]+?)^\)$/gm;
var text = `* 1 FETCH (UID 2 BODY[] {7765}
data to be extracted
from here!
)
* 2 FETCH (UID 3 BODY[] {443}
data to be extracted
from here!
)
* 3 FETCH (UID 4 BODY[] {4432}
data to be extracted
from here!
)
* 4 FETCH (UID 5 BODY[] {123}
data to be extracted
from here!
)
SS01 OK Success`;
var matches = [...text.matchAll(regex)];
console.log(Array.from(matches,x => x[1].trim()));
你可以使用
/\* \d+ FETCH \(UID \d+ BODY\[] {\d+}\s*([\s\S]*?)(?=\)[\r\n]+(?:\* \d+ FETCH \(UID \d+ BODY\[] {\d+}|SS01 OK Success))/g
参见regex demo。或者,如果您不需要如此彻底地检查所有上下文,请使用
/{\d+}\s*([\s\S]*?)(?=\))/g
详情:
\* \d+ FETCH \(UID \d+ BODY\[] {\d+}
- *
, space, 一个或多个数字, space, FETCH
, space, (UID
, space, 1+位, space, BODY[]
, space, {
, 一位或多位, }
\s*
- 零个或多个白色spaces
([\s\S]*?)
- 第 1 组(您需要获得的值):尽可能少的任何零个或多个字符
(?=\)[\r\n]+(?:\* \d+ FETCH \(UID \d+ BODY\[] {\d+}|SS01 OK Success))
- 正向前瞻,需要紧靠当前位置右侧的以下模式序列:
\)
- 一个 )
字符
[\r\n]+
- 一个或多个 CR 或 LF 字符
(?:\* \d+ FETCH \(UID \d+ BODY\[] {\d+}|SS01 OK Success)
- 两者之一
\* \d+ FETCH \(UID \d+ BODY\[] {\d+}
- *
, space, 一个或多个数字, space, FETCH
, space, (UID
, space, 1+位, space, BODY[]
, space, {
, 一位或多位, }
|
- 或
SS01 OK Success
- SS01 OK Success
字符串。
JavaScript 演示:
const rx = /\* \d+ FETCH \(UID \d+ BODY\[] {\d+}\s*([\s\S]*?)(?=\)[\r\n]+(?:\* \d+ FETCH \(UID \d+ BODY\[] {\d+}|SS01 OK Success))/g;
const text = '* 1 FETCH (UID 2 BODY[] {7765}\ndata to be extracted\nfrom here!\n)\n* 2 FETCH (UID 3 BODY[] {443}\ndata to be extracted\nfrom here!\n)\n* 3 FETCH (UID 4 BODY[] {4432}\ndata to be extracted\nfrom here!\n)\n* 4 FETCH (UID 5 BODY[] {123}\ndata to be extracted\nfrom here!\n)\nSS01 OK Success';
const matches = [...text.matchAll(rx)];
console.log(Array.from(matches,x => x[1].trim()));
// Or, with the simplified regex:
console.log(
Array.from(text.matchAll(/{\d+}\s*([\s\S]*?)(?=\))/g), x => x[1].trim())
)
我正在编写一个 IMAP4 备份应用程序。经过大量研究,我找到了 return 所有消息或一系列消息的正确 IMAP 命令。
SS01 UID FETCH 1:* BODY[]
这个漂亮的命令 returns 数据格式如下:
* 1 FETCH (UID 2 BODY[] {7765}
data to be extracted
from here! which can possibly contain
) <--- one or more prior to its final...
)
* 2 FETCH (UID 3 BODY[] {443}
data to be extracted
from here! which can possibly contain
) <--- one or more prior to its final...
)
* 3 FETCH (UID 4 BODY[] {4432}
data to be extracted
from here! which can possibly contain
) <--- one or more prior to its final...
)
* 4 FETCH (UID 5 BODY[] {123}
data to be extracted
from here! which can possibly contain
) <--- one or more prior to its final...
)
SS01 OK Success
我能在本文中找到的唯一独特模式是:
第一条消息以...
开头1 FETCH (UID 2 BODY[] {7765}
每条不是最后一条的消息都以....
结尾)
* 2 FETCH (UID 3 BODY[] {443}
最后一条消息以...
结尾)
SS01 OK Success
我在网站上找到了以下示例,我正在尝试实施但没有成功。
RegEx 模式是:
(?<=This is)(.*)(?=sentence)
这是一个不起作用的最小可重现示例。
(\*\s\d+\s\w+\s\(UID\s\d+\sBODY\[\]\s\{\d+\})(.*\n)(\)\n\*\s\d+\s\w+\s\(UID\s\d+\sBODY\[\]\s\{\d+\})
您可以像这样极大地简化您的正则表达式:
\{\d+\}$[\r\n]+([\s\S]+?)^\)$
\{\d+\}$
- 在行尾找到{digits}
[\r\n]+
- 捕获任何新行([\s\S]+?)
- 松散地捕获导致以下内容的所需文本:(阅读以下要点)^\)$
- 找到只有右括号)
的行
您想要的文本将在捕获组 #1 中
https://regex101.com/r/A86eEv/1/
var regex = /\{\d+\}$[\r\n]+([\s\S]+?)^\)$/gm;
var text = `* 1 FETCH (UID 2 BODY[] {7765}
data to be extracted
from here!
)
* 2 FETCH (UID 3 BODY[] {443}
data to be extracted
from here!
)
* 3 FETCH (UID 4 BODY[] {4432}
data to be extracted
from here!
)
* 4 FETCH (UID 5 BODY[] {123}
data to be extracted
from here!
)
SS01 OK Success`;
var matches = [...text.matchAll(regex)];
console.log(Array.from(matches,x => x[1].trim()));
你可以使用
/\* \d+ FETCH \(UID \d+ BODY\[] {\d+}\s*([\s\S]*?)(?=\)[\r\n]+(?:\* \d+ FETCH \(UID \d+ BODY\[] {\d+}|SS01 OK Success))/g
参见regex demo。或者,如果您不需要如此彻底地检查所有上下文,请使用
/{\d+}\s*([\s\S]*?)(?=\))/g
详情:
\* \d+ FETCH \(UID \d+ BODY\[] {\d+}
-*
, space, 一个或多个数字, space,FETCH
, space,(UID
, space, 1+位, space,BODY[]
, space,{
, 一位或多位,}
\s*
- 零个或多个白色spaces([\s\S]*?)
- 第 1 组(您需要获得的值):尽可能少的任何零个或多个字符(?=\)[\r\n]+(?:\* \d+ FETCH \(UID \d+ BODY\[] {\d+}|SS01 OK Success))
- 正向前瞻,需要紧靠当前位置右侧的以下模式序列:\)
- 一个)
字符[\r\n]+
- 一个或多个 CR 或 LF 字符(?:\* \d+ FETCH \(UID \d+ BODY\[] {\d+}|SS01 OK Success)
- 两者之一\* \d+ FETCH \(UID \d+ BODY\[] {\d+}
-*
, space, 一个或多个数字, space,FETCH
, space,(UID
, space, 1+位, space,BODY[]
, space,{
, 一位或多位,}
|
- 或SS01 OK Success
-SS01 OK Success
字符串。
JavaScript 演示:
const rx = /\* \d+ FETCH \(UID \d+ BODY\[] {\d+}\s*([\s\S]*?)(?=\)[\r\n]+(?:\* \d+ FETCH \(UID \d+ BODY\[] {\d+}|SS01 OK Success))/g;
const text = '* 1 FETCH (UID 2 BODY[] {7765}\ndata to be extracted\nfrom here!\n)\n* 2 FETCH (UID 3 BODY[] {443}\ndata to be extracted\nfrom here!\n)\n* 3 FETCH (UID 4 BODY[] {4432}\ndata to be extracted\nfrom here!\n)\n* 4 FETCH (UID 5 BODY[] {123}\ndata to be extracted\nfrom here!\n)\nSS01 OK Success';
const matches = [...text.matchAll(rx)];
console.log(Array.from(matches,x => x[1].trim()));
// Or, with the simplified regex:
console.log(
Array.from(text.matchAll(/{\d+}\s*([\s\S]*?)(?=\))/g), x => x[1].trim())
)