如何使用正则表达式匹配子字符串并在匹配前后获取一些额外的字符串
How to match a substring using regex and get some extra strings before and after the match
在下面的字符串中,我想提取子字符串。
情景:
只要 wafers_starts
在字符串中匹配,就应该从刚刚匹配的 <ac:structured-macro
中选择该字符串。 (下面的例子有两个,我只需要wafers_starts
之前的那个)
应该选择子串,直到匹配wafers_ends
加上第一个结束标签</ac:structured-macro>
。
示例代码:
if ($matches -ne $null) { Remove-Variable $matches }
$confluenceHtml = "<h2>Description</h2><ac:structured-macro ac:macro-id=""77f3n751-39w7-4746-acd4-bee7586449ed"" ac:name=""warning"" ac:schema-version=""1""><ac:parameter ac:name=""title"">Compatibility</ac:parameter><ac:rich-text-body><p class=""auto-cursor-target""><br/></p><table class=""wrapped""><colgroup> <col/> <col/> <col/> </colgroup><tbody><tr><th><p>Prerequisite</p><p>This needs a progressive rollout </p></td></tr></tbody></table><p class=""auto-cursor-target""><br/></p><p class=""auto-cursor-target""><br/></p><ac:structured-macro ac:macro-id=""4657sd53-e024-4ea3-a5e2-4586667542da"" ac:name=""excerpt"" ac:schema-version=""1""><ac:parameter ac:name=""hidden"">true</ac:parameter><ac:parameter ac:name=""atlassian-macro-output-type"">INLINE</ac:parameter><ac:rich-text-body><p>wafers_starts</p></ac:rich-text-body></ac:structured-macro><h2>Deployment Notes</h2><ac:structured-macro ac:macro-id=""77f5e121-31d7-4576-awq4-bej57t6d39ed"" ac:name=""warning"" ac:schema-version=""1""><ac:parameter ac:name=""title"">Compatibility</ac:parameter> <ac:rich-text-body><p class=""auto-cursor-target""><br/></p><table class=""wrapped""><colgroup> <col/> <col/> <col/> </colgroup><tbody><tr><th><p>Prerequisite</p><p>This needs a progressive rollout 2,3,4,5 and so on</p></td></tr></tbody></table><p class=""auto-cursor-target""><br/></p><ac:structured-macro ac:macro-id=""72d7h552-a5dd-44cc-a4re-6f3247574fbd"" ac:name=""excerpt"" ac:schema-version=""1""><ac:parameter ac:name=""hidden"">true</ac:parameter><ac:parameter ac:name=""atlassian-macro-output-type"">INLINE</ac:parameter><ac:rich-text-body><p>wafers_ends</p></ac:rich-text-body></ac:structured-macro><p class=""auto-cursor-target""><br/></p></ac:structured-macro>"
if ($confluenceHtml -match '\<ac:structured-macro.+?wafers_starts([\s\S]*)wafers_ends.+?\<\/ac:structured-macro\>') {
$matches[0]
}
输出:
<ac:structured-macro ac:macro-id="77f3n751-39w7-4746-acd4-bee7586449ed" ac:name="warning" ac:schema-version="1"><ac:parameter ac:name="title">Compatibility</ac:parameter><ac:rich-text-body><p class="auto-cursor-target"><br/></p><table class="wrapped"><colgroup> <col/> <col/> <col/> </colgroup><tbody><tr><th><p>Prerequisite</p><p>This needs a progressive rollout </p></td></tr></tbody></table><p class="auto-cursor-target"><br/></p><p class="auto-cursor-target"><br/></p><ac:structured-macro ac:macro-id="4657sd53-e024-4ea3-a5e2-4586667542da" ac:name="excerpt" ac:schema-version="1"><ac:parameter ac:name="hidden">true</ac:parameter><ac:parameter ac:name="atlassian-macro-output-type">INLINE</ac:parameter><ac:rich-text-body><p>wafers_starts</p></ac:rich-text-body></ac:structured-macro><h2>Deployment Notes</h2><ac:structured-macro ac:macro-id="77f5e121-31d7-4576-awq4-bej57t6d39ed" ac:name="warning" ac:schema-version="1"><ac:parameter ac:name="title">Compatibility</ac:parameter> <ac:rich-text-body><p class="auto-cursor-target"><br/></p><table class="wrapped"><colgroup> <col/> <col/> <col/> </colgroup><tbody><tr><th><p>Prerequisite</p><p>This needs a progressive rollout 2,3,4,5 and so on</p></td></tr></tbody></table><p class="auto-cursor-target"><br/></p><ac:structured-macro ac:macro-id="72d7h552-a5dd-44cc-a4re-6f3247574fbd" ac:name="excerpt" ac:schema-version="1"><ac:parameter ac:name="hidden">true</ac:parameter><ac:parameter ac:name="atlassian-macro-output-type">INLINE</ac:parameter><ac:rich-text-body><p>wafers_ends</p></ac:rich-text-body></ac:structured-macro>
问题:
子串结尾OK。但是,即使经过多次尝试也无法获取子字符串的开头。正则表达式包括从 <ac:structured-macro
第一次出现的开始。
期望输出:
我只想要下面的子字符串,它只包含一次 <ac:structured-macro
,就在第一个匹配字符串 wafers_starts
之前
<ac:structured-macro ac:macro-id="4657sd53-e024-4ea3-a5e2-4586667542da" ac:name="excerpt" ac:schema-version="1"><ac:parameter ac:name="hidden">true</ac:parameter><ac:parameter ac:name="atlassian-macro-output-type">INLINE</ac:parameter><ac:rich-text-body><p>wafers_starts</p></ac:rich-text-body></ac:structured-macro><h2>Deployment Notes</h2><ac:structured-macro ac:macro-id="77f5e121-31d7-4576-awq4-bej57t6d39ed" ac:name="warning" ac:schema-version="1"><ac:parameter ac:name="title">Compatibility</ac:parameter> <ac:rich-text-body><p class="auto-cursor-target"><br/></p><table class="wrapped"><colgroup> <col/> <col/> <col/> </colgroup><tbody><tr><th><p>Prerequisite</p><p>This needs a progressive rollout 2,3,4,5 and so on</p></td></tr></tbody></table><p class="auto-cursor-target"><br/></p><ac:structured-macro ac:macro-id="72d7h552-a5dd-44cc-a4re-6f3247574fbd" ac:name="excerpt" ac:schema-version="1"><ac:parameter ac:name="hidden">true</ac:parameter><ac:parameter ac:name="atlassian-macro-output-type">INLINE</ac:parameter><ac:rich-text-body><p>wafers_ends</p></ac:rich-text-body></ac:structured-macro>
问题:
正在寻找 correted/working 正则表达式模式。
您需要使用此正则表达式,它使用 tempered greedy token (?:(?!ac:structured-macro).)+
模式来拒绝 ac:structured-macro
在第一次匹配后的任何进一步匹配。
<ac:structured-macro(?:(?!ac:structured-macro).)+wafers_starts([\s\S]*)wafers_ends.+?<\/ac:structured-macro>
在下面的字符串中,我想提取子字符串。
情景:
只要
wafers_starts
在字符串中匹配,就应该从刚刚匹配的<ac:structured-macro
中选择该字符串。 (下面的例子有两个,我只需要wafers_starts
之前的那个)应该选择子串,直到匹配
wafers_ends
加上第一个结束标签</ac:structured-macro>
。
示例代码:
if ($matches -ne $null) { Remove-Variable $matches }
$confluenceHtml = "<h2>Description</h2><ac:structured-macro ac:macro-id=""77f3n751-39w7-4746-acd4-bee7586449ed"" ac:name=""warning"" ac:schema-version=""1""><ac:parameter ac:name=""title"">Compatibility</ac:parameter><ac:rich-text-body><p class=""auto-cursor-target""><br/></p><table class=""wrapped""><colgroup> <col/> <col/> <col/> </colgroup><tbody><tr><th><p>Prerequisite</p><p>This needs a progressive rollout </p></td></tr></tbody></table><p class=""auto-cursor-target""><br/></p><p class=""auto-cursor-target""><br/></p><ac:structured-macro ac:macro-id=""4657sd53-e024-4ea3-a5e2-4586667542da"" ac:name=""excerpt"" ac:schema-version=""1""><ac:parameter ac:name=""hidden"">true</ac:parameter><ac:parameter ac:name=""atlassian-macro-output-type"">INLINE</ac:parameter><ac:rich-text-body><p>wafers_starts</p></ac:rich-text-body></ac:structured-macro><h2>Deployment Notes</h2><ac:structured-macro ac:macro-id=""77f5e121-31d7-4576-awq4-bej57t6d39ed"" ac:name=""warning"" ac:schema-version=""1""><ac:parameter ac:name=""title"">Compatibility</ac:parameter> <ac:rich-text-body><p class=""auto-cursor-target""><br/></p><table class=""wrapped""><colgroup> <col/> <col/> <col/> </colgroup><tbody><tr><th><p>Prerequisite</p><p>This needs a progressive rollout 2,3,4,5 and so on</p></td></tr></tbody></table><p class=""auto-cursor-target""><br/></p><ac:structured-macro ac:macro-id=""72d7h552-a5dd-44cc-a4re-6f3247574fbd"" ac:name=""excerpt"" ac:schema-version=""1""><ac:parameter ac:name=""hidden"">true</ac:parameter><ac:parameter ac:name=""atlassian-macro-output-type"">INLINE</ac:parameter><ac:rich-text-body><p>wafers_ends</p></ac:rich-text-body></ac:structured-macro><p class=""auto-cursor-target""><br/></p></ac:structured-macro>"
if ($confluenceHtml -match '\<ac:structured-macro.+?wafers_starts([\s\S]*)wafers_ends.+?\<\/ac:structured-macro\>') {
$matches[0]
}
输出:
<ac:structured-macro ac:macro-id="77f3n751-39w7-4746-acd4-bee7586449ed" ac:name="warning" ac:schema-version="1"><ac:parameter ac:name="title">Compatibility</ac:parameter><ac:rich-text-body><p class="auto-cursor-target"><br/></p><table class="wrapped"><colgroup> <col/> <col/> <col/> </colgroup><tbody><tr><th><p>Prerequisite</p><p>This needs a progressive rollout </p></td></tr></tbody></table><p class="auto-cursor-target"><br/></p><p class="auto-cursor-target"><br/></p><ac:structured-macro ac:macro-id="4657sd53-e024-4ea3-a5e2-4586667542da" ac:name="excerpt" ac:schema-version="1"><ac:parameter ac:name="hidden">true</ac:parameter><ac:parameter ac:name="atlassian-macro-output-type">INLINE</ac:parameter><ac:rich-text-body><p>wafers_starts</p></ac:rich-text-body></ac:structured-macro><h2>Deployment Notes</h2><ac:structured-macro ac:macro-id="77f5e121-31d7-4576-awq4-bej57t6d39ed" ac:name="warning" ac:schema-version="1"><ac:parameter ac:name="title">Compatibility</ac:parameter> <ac:rich-text-body><p class="auto-cursor-target"><br/></p><table class="wrapped"><colgroup> <col/> <col/> <col/> </colgroup><tbody><tr><th><p>Prerequisite</p><p>This needs a progressive rollout 2,3,4,5 and so on</p></td></tr></tbody></table><p class="auto-cursor-target"><br/></p><ac:structured-macro ac:macro-id="72d7h552-a5dd-44cc-a4re-6f3247574fbd" ac:name="excerpt" ac:schema-version="1"><ac:parameter ac:name="hidden">true</ac:parameter><ac:parameter ac:name="atlassian-macro-output-type">INLINE</ac:parameter><ac:rich-text-body><p>wafers_ends</p></ac:rich-text-body></ac:structured-macro>
问题:
子串结尾OK。但是,即使经过多次尝试也无法获取子字符串的开头。正则表达式包括从 <ac:structured-macro
第一次出现的开始。
期望输出:
我只想要下面的子字符串,它只包含一次 <ac:structured-macro
,就在第一个匹配字符串 wafers_starts
<ac:structured-macro ac:macro-id="4657sd53-e024-4ea3-a5e2-4586667542da" ac:name="excerpt" ac:schema-version="1"><ac:parameter ac:name="hidden">true</ac:parameter><ac:parameter ac:name="atlassian-macro-output-type">INLINE</ac:parameter><ac:rich-text-body><p>wafers_starts</p></ac:rich-text-body></ac:structured-macro><h2>Deployment Notes</h2><ac:structured-macro ac:macro-id="77f5e121-31d7-4576-awq4-bej57t6d39ed" ac:name="warning" ac:schema-version="1"><ac:parameter ac:name="title">Compatibility</ac:parameter> <ac:rich-text-body><p class="auto-cursor-target"><br/></p><table class="wrapped"><colgroup> <col/> <col/> <col/> </colgroup><tbody><tr><th><p>Prerequisite</p><p>This needs a progressive rollout 2,3,4,5 and so on</p></td></tr></tbody></table><p class="auto-cursor-target"><br/></p><ac:structured-macro ac:macro-id="72d7h552-a5dd-44cc-a4re-6f3247574fbd" ac:name="excerpt" ac:schema-version="1"><ac:parameter ac:name="hidden">true</ac:parameter><ac:parameter ac:name="atlassian-macro-output-type">INLINE</ac:parameter><ac:rich-text-body><p>wafers_ends</p></ac:rich-text-body></ac:structured-macro>
问题:
正在寻找 correted/working 正则表达式模式。
您需要使用此正则表达式,它使用 tempered greedy token (?:(?!ac:structured-macro).)+
模式来拒绝 ac:structured-macro
在第一次匹配后的任何进一步匹配。
<ac:structured-macro(?:(?!ac:structured-macro).)+wafers_starts([\s\S]*)wafers_ends.+?<\/ac:structured-macro>