正则表达式从树输出中查找重复的文件名

Question

请协助从树输出中查找重复的文件名。我在 SublimeText 中使用正则表达式查找重复文件名时遇到问题。我正在搜索的文件来自发送到文本文件的 tree 命令的输出：

src/test/resources
|-- WPCDPS
|   `-- RiskIndicatorsEvaluationRuleTest.feature
|-- Accelerated.feature
|-- AcceptedAFS.feature
|-- AgeValidationRemoval.feature
|-- Anxiety.feature
|-- CheckDisabledOccupation.feature
|-- Extended.feature
|-- Financal.feature
|-- Fainancial.feature
|-- FloridaSpecific.feature
|-- Hypertension.feature
|-- LifeForceOrPending.feature
|-- LifestyleInformation.feature
|-- MinValue.feature
|-- Occupations
|   |-- OccupationTranslation.feature
|   |-- OccupationsWithPreConditions.feature
|   |-- Accelerated.feature
|   `-- OccupationsWithoutPreConditions.feature
|-- Florida.feature

我试过 (?m)(\bAccelerated\.feature\b)(?=[\s\|\-\n]*) 的不同组合，但没有成功。

Answer 1

代码

See regex in use here

((?<= )[\w.]+$)(?=[\s\S]*(?<= )()$)

在 regex demo here

中（单行）使用 s 修饰符

((?<= )[\w.]+$)(?=.*(?<= )()$)

注意：出于某种原因，上面的模式（启用单行模式）导致 regex101 超时。我已经在 Code Generator 下测试了代码，它工作得很好 (doesn't timeout when used in code)。

说明

((?<= )[\w.]+$) 将以下内容捕获到捕获组 1
- (?<= ) 积极的回顾确保前面的是 space
- [\w.]+匹配任何单词字符或点字符一次或多次
- $ 断言行尾的位置
(?=[\s\S]*(?<= )()$) 正向前瞻确保后续匹配
- [\s\S]* 任何字符任意次数。（我们在这里不使用点，因为它不会匹配换行符。如果你打开单行标志，你实际上可以用 . 替换 [\s\S]，这会强制 .也匹配换行符)
- (?<= ) 积极的回顾确保前面的是 space
- () 将以下内容捕获到捕获组 2
  - </code> 匹配与捕获组 1 最近匹配的相同文本</li> </ul></li> <li><code>$ 断言行尾的位置
通俗地说，如果它在 space 之后和行尾之前，它匹配整个文件名。然后它期待着找到具有相同规则的副本（space 之前，行尾之后）。

正则表达式从树输出中查找重复的文件名

Regex find duplicate file names from tree output

python

regex

sublimetext

代码

说明