正则表达式回顾

Question

我的正则表达式回溯有问题！

这是我的示例文本：

 href="dermatitis&gt;" "blah blah blah &gt;" href="lichen-planus&gt;"

我想匹配所有 >" 当且仅当它前面有一个 href= 并且还有另一个规则！

href= 必须紧接在前一个引号之前。（例如文本中的第二个 &ght; 前面有一个 href= 但 href= 不在前一个引号之前，我不希望它匹配）在我的文本中，有是 3 &ght; 我希望第一个和第三个匹配，第二个不匹配，基于我上面描述的规则。

我希望这个问题得到足够的解释！我处理一些离线文本文件，我可以使用 notepad++、powershell 或任何其他合适的引擎。

我们将不胜感激。

Answer 1

如果您的输入在每个项目之间始终有空格，您可以在 PowerShell 中执行此操作的一种方法：

$a = '"href="dermatitis&gt;" "blah blah blah &gt;" href="lichen-planus&gt;"'
$b = $a.Split(" ")
$c = $b | ? { $_ -match 'href="' }
Write-Output $c

Answer 2

另一种通过 PowerShell 攻击它的方法，它也删除了不需要的 >

# Set the regular expression
$regex = '(?<=href\=")(.*?)(?=")'

$sampleText = 'href="dermatitis>&ght;" "blah blah blah >" href="lichen-planus>&ght;"'

# Separate the single line string into 3 entities with " " as the delimiter
$sampleTextSplit = $sampleText.Split(" ")

$sampleMatches = $sampleTextSplit | Where-Object {$_ -match $regex} | Foreach-Object { $_.Replace("&gt;","") }

# Show the results
$sampleMatches

这returns两个对象：

href="dermatitis>"
href="lichen-planus>"

Answer 3

Notepad++ 不理解 lookbehind，你必须使用 \K 代替。

Ctrl+F
查找内容：href="[^"]*\K>(?=")
选中环绕
检查正则表达式
在文档中搜索

解释：

href="[^"]* : search for href=" followed by 0 or more any charcater but "
\K          : forget all we have seen until this position
&gt;        : literally &gt;
(?=")       : lookahead, make sure we have '"' after

Answer 4

我知道我迟到了 2 年，但无论如何 :) 这是解决方案：

$string = 'href="dermatitis&gt;" "blah blah blah &gt;" href="lichen-planus&gt;"'
$value = '&gt;"'
$regex = 'href=".+?(' + $value + ')'
([regex]::matches($string,$regex).groups.value) | ? {$_ -eq $value}

哪个 return 第一个和第三个值：

&gt;"
&gt;"

正则表达式回顾

regex lookbehind

regex

powershell

notepad++

lookahead

lookbehind