用于检测以字符结尾的多行文本的正则表达式

A regex for detecting multiline text if ended with a character

我有一个解析器可以解析 PAWN 语言的代码。

我已经有一个正则表达式可以解析该代码中的定义,典型的定义如下所示:

#define DEFINE_NAME DEFINE_VALUE

并且我使用以下正则表达式来检测它:

#define[ \t]+([^\n\r\s\;]+)(?:[ \t]*([^\s;]+))?

现在是实际问题。PAWN 语言仅在每行以斜杠结尾时才允许多行。所以这将是有效的:

#define DEFINE_NAME \
    DEFINE_VALUE    \
    CONTINUE_VALUE

如果有更多斜杠,可以继续。

太好了..我想要一个可以捕获可能的多行内容的正则表达式。

NOTE: I also need it to work in single line defines.. So please keep that in mind.

Also I use .NET, So yes that's the flavor.

非常感谢任何help/contribution。 :D

我们可以包含可选的斜杠和换行符:

(?:\\r?\n[ \t]*)?

然后,要允许多行以斜杠结尾,我们可以重复以下构造:

(?<value>(?>                  # Captures the DEFINE_VALUE
    [^\\r\n;]+               #   Any char (except \ \n)
  |                           #  or
    \[^\r\n][^\\r\n;]*      #   "\" within value
)+)?                          #  (~unrolling the loop)
(?:\\r?\n[ \t]*)?            # allow "\" for new line  

代码

Dim pattern As String = "^[ \t]*                    # beginning of line     " & vbCrLf &
                        "[#]define[ \t]+            # PAWN #define          " & vbCrLf &
                        "(?<name>[^\s\;]+)         # DEFINE_NAME           " & vbCrLf &
                        "[ \t]*(?:\\r?\n[ \t]*)?   # spaces and optional \ " & vbCrLf &
                        "(?>                        #                       " & vbCrLf &
                        "  (?<value>(?>             # DEFINE_VALUE          " & vbCrLf &
                        "    [^\\r\n;]+   |        #  Any char -except \ \n" & vbCrLf &
                        "    \[^\r\n][^\\r\n;]*   #  \ within value       " & vbCrLf &
                        "  )+)?                     #  (~unrolling the loop)" & vbCrLf &
                        "  (?:\\r?\n[ \t]*)?       # \ for new line        " & vbCrLf &
                        ")*                         # repeated for each line"

Dim re As Regex = new Regex( pattern, RegexOptions.Multiline Or
                                      RegexOptions.IgnorePatternWhitespace)
Dim text As String =    "#define DEFINE_NAME \"     & vbCrLf &
                        "       DEFINE VALUE\"      & vbCrLf &
                        "       CONTINUE VALUE"     & vbCrLf &
                        "#define TheName TheValue"
Dim mNum As Integer = 0
Dim matches As MatchCollection = re.Matches(text)

'Loop Matches
For Each match As Match In matches
    'get name
    Dim name As String = match.Groups("name").Value
    Console.WriteLine("Match #{0} - Name: {1}", mNum, name)

    'get values (in each capture)
    Dim captureCtr As Integer = 0
    For Each capture As Capture In match.Groups("value").Captures
        'loop captures for the Group "value"
        Console.WriteLine(vbTab & "Line #{0} - Value: {1}", 
                                captureCtr, capture.Value)
        captureCtr += 1              
    Next
    mNum += 1
Next

输出

Match #0 - Name: DEFINE_NAME
    Line #0 - Value: DEFINE_VALUE
    Line #1 - Value: CONTINUE_VALUE
Match #1 - Name: TheName
    Line #0 - Value: TheValue

ideone demo


  • 注意我正在使用 named groups (?<name>..) and (?<value>..). That's why it's referenced in the code as match.Groups("name").

  • 此外,组(?<value>[^\s;]+)对每一行重复。 Groups("value") 包含有关最后捕获的子字符串的信息。但是 Captures property contains information about all the substrings captured by the group. This is a unique 功能。
    这就是我循环 match.Groups("value").Captures.

  • 的原因