用于检测以字符结尾的多行文本的正则表达式
A regex for detecting multiline text if ended with a character
我有一个解析器可以解析 PAWN 语言的代码。
我已经有一个正则表达式可以解析该代码中的定义,典型的定义如下所示:
#define DEFINE_NAME DEFINE_VALUE
并且我使用以下正则表达式来检测它:
#define[ \t]+([^\n\r\s\;]+)(?:[ \t]*([^\s;]+))?
现在是实际问题。PAWN 语言仅在每行以斜杠结尾时才允许多行。所以这将是有效的:
#define DEFINE_NAME \
DEFINE_VALUE \
CONTINUE_VALUE
如果有更多斜杠,可以继续。
太好了..我想要一个可以捕获可能的多行内容的正则表达式。
NOTE: I also need it to work in single line defines.. So please keep that in mind.
Also I use .NET, So yes that's the flavor.
非常感谢任何help/contribution。 :D
我们可以包含可选的斜杠和换行符:
(?:\\r?\n[ \t]*)?
然后,要允许多行以斜杠结尾,我们可以重复以下构造:
(?<value>(?> # Captures the DEFINE_VALUE
[^\\r\n;]+ # Any char (except \ \n)
| # or
\[^\r\n][^\\r\n;]* # "\" within value
)+)? # (~unrolling the loop)
(?:\\r?\n[ \t]*)? # allow "\" for new line
代码
Dim pattern As String = "^[ \t]* # beginning of line " & vbCrLf &
"[#]define[ \t]+ # PAWN #define " & vbCrLf &
"(?<name>[^\s\;]+) # DEFINE_NAME " & vbCrLf &
"[ \t]*(?:\\r?\n[ \t]*)? # spaces and optional \ " & vbCrLf &
"(?> # " & vbCrLf &
" (?<value>(?> # DEFINE_VALUE " & vbCrLf &
" [^\\r\n;]+ | # Any char -except \ \n" & vbCrLf &
" \[^\r\n][^\\r\n;]* # \ within value " & vbCrLf &
" )+)? # (~unrolling the loop)" & vbCrLf &
" (?:\\r?\n[ \t]*)? # \ for new line " & vbCrLf &
")* # repeated for each line"
Dim re As Regex = new Regex( pattern, RegexOptions.Multiline Or
RegexOptions.IgnorePatternWhitespace)
Dim text As String = "#define DEFINE_NAME \" & vbCrLf &
" DEFINE VALUE\" & vbCrLf &
" CONTINUE VALUE" & vbCrLf &
"#define TheName TheValue"
Dim mNum As Integer = 0
Dim matches As MatchCollection = re.Matches(text)
'Loop Matches
For Each match As Match In matches
'get name
Dim name As String = match.Groups("name").Value
Console.WriteLine("Match #{0} - Name: {1}", mNum, name)
'get values (in each capture)
Dim captureCtr As Integer = 0
For Each capture As Capture In match.Groups("value").Captures
'loop captures for the Group "value"
Console.WriteLine(vbTab & "Line #{0} - Value: {1}",
captureCtr, capture.Value)
captureCtr += 1
Next
mNum += 1
Next
输出
Match #0 - Name: DEFINE_NAME
Line #0 - Value: DEFINE_VALUE
Line #1 - Value: CONTINUE_VALUE
Match #1 - Name: TheName
Line #0 - Value: TheValue
注意我正在使用 named groups (?<name>..)
and (?<value>..)
. That's why it's referenced in the code as match.Groups("name")
.
此外,组(?<value>[^\s;]+)
对每一行重复。 Groups("value")
包含有关最后捕获的子字符串的信息。但是 Captures property contains information about all the substrings captured by the group. This is a unique .net 功能。
这就是我循环 match.Groups("value").Captures
.
的原因
我有一个解析器可以解析 PAWN 语言的代码。
我已经有一个正则表达式可以解析该代码中的定义,典型的定义如下所示:
#define DEFINE_NAME DEFINE_VALUE
并且我使用以下正则表达式来检测它:
#define[ \t]+([^\n\r\s\;]+)(?:[ \t]*([^\s;]+))?
现在是实际问题。PAWN 语言仅在每行以斜杠结尾时才允许多行。所以这将是有效的:
#define DEFINE_NAME \
DEFINE_VALUE \
CONTINUE_VALUE
如果有更多斜杠,可以继续。
太好了..我想要一个可以捕获可能的多行内容的正则表达式。
NOTE: I also need it to work in single line defines.. So please keep that in mind.
Also I use .NET, So yes that's the flavor.
非常感谢任何help/contribution。 :D
我们可以包含可选的斜杠和换行符:
(?:\\r?\n[ \t]*)?
然后,要允许多行以斜杠结尾,我们可以重复以下构造:
(?<value>(?> # Captures the DEFINE_VALUE
[^\\r\n;]+ # Any char (except \ \n)
| # or
\[^\r\n][^\\r\n;]* # "\" within value
)+)? # (~unrolling the loop)
(?:\\r?\n[ \t]*)? # allow "\" for new line
代码
Dim pattern As String = "^[ \t]* # beginning of line " & vbCrLf &
"[#]define[ \t]+ # PAWN #define " & vbCrLf &
"(?<name>[^\s\;]+) # DEFINE_NAME " & vbCrLf &
"[ \t]*(?:\\r?\n[ \t]*)? # spaces and optional \ " & vbCrLf &
"(?> # " & vbCrLf &
" (?<value>(?> # DEFINE_VALUE " & vbCrLf &
" [^\\r\n;]+ | # Any char -except \ \n" & vbCrLf &
" \[^\r\n][^\\r\n;]* # \ within value " & vbCrLf &
" )+)? # (~unrolling the loop)" & vbCrLf &
" (?:\\r?\n[ \t]*)? # \ for new line " & vbCrLf &
")* # repeated for each line"
Dim re As Regex = new Regex( pattern, RegexOptions.Multiline Or
RegexOptions.IgnorePatternWhitespace)
Dim text As String = "#define DEFINE_NAME \" & vbCrLf &
" DEFINE VALUE\" & vbCrLf &
" CONTINUE VALUE" & vbCrLf &
"#define TheName TheValue"
Dim mNum As Integer = 0
Dim matches As MatchCollection = re.Matches(text)
'Loop Matches
For Each match As Match In matches
'get name
Dim name As String = match.Groups("name").Value
Console.WriteLine("Match #{0} - Name: {1}", mNum, name)
'get values (in each capture)
Dim captureCtr As Integer = 0
For Each capture As Capture In match.Groups("value").Captures
'loop captures for the Group "value"
Console.WriteLine(vbTab & "Line #{0} - Value: {1}",
captureCtr, capture.Value)
captureCtr += 1
Next
mNum += 1
Next
输出
Match #0 - Name: DEFINE_NAME
Line #0 - Value: DEFINE_VALUE
Line #1 - Value: CONTINUE_VALUE
Match #1 - Name: TheName
Line #0 - Value: TheValue
注意我正在使用 named groups
(?<name>..)
and(?<value>..)
. That's why it's referenced in the code asmatch.Groups("name")
.此外,组
(?<value>[^\s;]+)
对每一行重复。Groups("value")
包含有关最后捕获的子字符串的信息。但是 Captures property contains information about all the substrings captured by the group. This is a unique .net 功能。
这就是我循环match.Groups("value").Captures
. 的原因