Notepad++ 正则表达式获取位于两个重复标记之间的每个唯一数字
Notepad++ regex to get each unique number located between two recurring tags
我在一个文件中有 16k 行文本。
用 Notepad++
打开它
每隔几行就会出现这样的情况:
a few rows of text before
Unique Identifier
628612012-078
Title
another few rows of text until the next Unique Identifier row
多几行其他内容然后它再次重复,中间的数字不同
a few rows of text before
Unique Identifier
1991-18613-001
Title
another few rows of text until the next Unique Identifier row
资料图片:
正则表达式如何获取 (copy/save) 位于每个 Unique Identifier
和 Title
tag/row 之间的每个 ID 号?
我不介意它是删除文件中的其余文本还是将输出另存为另一个文件或其他文件。理想情况下,我只需要按顺序列出这些数字。
已尝试 this/to 调整此 - 无法正常工作
如果文件有 Unix(LF) 行结尾,正则表达式是
(?<=Unique Identifier\n).+(?=\nTitle)
然后使用“全部标记”和“复制标记文本”将所有数学运算放入剪贴板
使用
^Unique Identifier\R(\d+(?:-\d+)*)(?=\RTitle$)|^(?!Unique Identifier$).*\R?
替换为:(?1\n:)
- 第一个捕获组值换行或什么都不做。
解释
--------------------------------------------------------------------------------
^ the beginning of the line
--------------------------------------------------------------------------------
Unique Identifier 'Unique Identifier'
--------------------------------------------------------------------------------
\R any line ending sequence
--------------------------------------------------------------------------------
( group and capture to :
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
--------------------------------------------------------------------------------
- '-'
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
) end of
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
\R 'R'
--------------------------------------------------------------------------------
Title 'Title'
--------------------------------------------------------------------------------
$ the end of the line
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
^ the beginning of the line
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
Unique Identifier 'Unique Identifier'
--------------------------------------------------------------------------------
$ the end of the line
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
\R? any line ending sequence (optional)
我在一个文件中有 16k 行文本。
用 Notepad++
每隔几行就会出现这样的情况:
a few rows of text before
Unique Identifier
628612012-078
Title
another few rows of text until the next Unique Identifier row
多几行其他内容然后它再次重复,中间的数字不同
a few rows of text before
Unique Identifier
1991-18613-001
Title
another few rows of text until the next Unique Identifier row
资料图片:
正则表达式如何获取 (copy/save) 位于每个 Unique Identifier
和 Title
tag/row 之间的每个 ID 号?
我不介意它是删除文件中的其余文本还是将输出另存为另一个文件或其他文件。理想情况下,我只需要按顺序列出这些数字。
已尝试 this/to 调整此
如果文件有 Unix(LF) 行结尾,正则表达式是
(?<=Unique Identifier\n).+(?=\nTitle)
然后使用“全部标记”和“复制标记文本”将所有数学运算放入剪贴板
使用
^Unique Identifier\R(\d+(?:-\d+)*)(?=\RTitle$)|^(?!Unique Identifier$).*\R?
替换为:(?1\n:)
- 第一个捕获组值换行或什么都不做。
解释
--------------------------------------------------------------------------------
^ the beginning of the line
--------------------------------------------------------------------------------
Unique Identifier 'Unique Identifier'
--------------------------------------------------------------------------------
\R any line ending sequence
--------------------------------------------------------------------------------
( group and capture to :
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
--------------------------------------------------------------------------------
- '-'
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
) end of
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
\R 'R'
--------------------------------------------------------------------------------
Title 'Title'
--------------------------------------------------------------------------------
$ the end of the line
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
^ the beginning of the line
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
Unique Identifier 'Unique Identifier'
--------------------------------------------------------------------------------
$ the end of the line
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
\R? any line ending sequence (optional)