使用 .NET 正则表达式解析引号之间的文本

Question

我有以下输入文本：

@"This is some text @foo=bar @name=""John \""The Anonymous One\"" Doe"" @age=38"

我想将使用@name=value 语法的值解析为 name/value 对。解析前面的字符串应该导致以下命名捕获：

name:"foo"
value:"bar"

name:"name"
value:"John \""The Anonymous One\"" Doe"

name:"age"
value:"38"

我尝试了以下正则表达式，几乎：

@"(?:(?<=\s)|^)@(?<name>\w+[A-Za-z0-9_-]+?)\s*=\s*(?<value>[A-Za-z0-9_-]+|(?="").+?(?=(?<!\)""))"

主要问题是它捕获了 "John \""The Anonymous One\"" Doe" 中的开头引号。我觉得这应该是后视而不是前瞻，但这似乎根本不起作用。

表达式的一些规则如下：

名称必须以字母开头，可以包含任何字母、数字、下划线或连字符。
不加引号的必须至少有一个字符，可以包含任何字母、数字、下划线或连字符。
带引号的值可以包含任何字符，包括任何空格和转义引号。

编辑：

这是 regex101.com 的结果：

(?:(?<=\s)|^)@(?<name>\w+[A-Za-z0-9_-]+?)\s*=\s*(?<value>(?<!")[A-Za-z0-9_-]+|(?=").+?(?=(?<!\)"))

(?:(?<=\s)|^) Non-capturing group
@ matches the character @ literally
(?<name>\w+[A-Za-z0-9_-]+?) Named capturing group name
\s* match any white space character [\r\n\t\f ]
= matches the character = literally
\s* match any white space character [\r\n\t\f ]
    Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
(?<value>(?<!")[A-Za-z0-9_-]+|(?=").+?(?=(?<!\)")) Named capturing group value
    1st Alternative: [A-Za-z0-9_-]+
        [A-Za-z0-9_-]+ match a single character present in the list below
            Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
            A-Z a single character in the range between A and Z (case sensitive)
            a-z a single character in the range between a and z (case sensitive)
            0-9 a single character in the range between 0 and 9
            _- a single character in the list _- literally
    2nd Alternative: (?=").+?(?=(?<!\)")
        (?=") Positive Lookahead - Assert that the regex below can be matched
            " matches the characters " literally
        .+? matches any character (except newline)
            Quantifier: +? Between one and unlimited times, as few times as possible, expanding as needed [lazy]
        (?=(?<!\)") Positive Lookahead - Assert that the regex below can be matched
            (?<!\) Negative Lookbehind - Assert that it is impossible to match the regex below
                \ matches the character \ literally
            " matches the characters " literally

Answer 1

使用字符串方法。

拆分

string myLongString = ""@"This is some text @foo=bar @name=""John \""The Anonymous One\"" Doe"" @age=38"

string[] nameValues = myLongString.Split('@');

从那里使用带“=”的拆分函数或使用 IndexOf(“=”)。

Answer 2

您可以使用非常有用的 .NET 正则表达式功能，其中允许多个同名捕获。此外，您的 (?<name>) 捕获组存在问题：它允许在第一个位置使用数字，这不符合您的第一个要求。

所以，我建议：

(?si)(?:(?<=\s)|^)@(?<name>\w+[a-z0-9_-]+?)\s*=\s*(?:(?<value>[a-z0-9_-]+)|(?:"")?(?<value>.+?)(?=(?<!\)""))

见demo

请注意，您无法在 regex101.com 调试 .NET 特定的正则表达式，您需要在 .NET 兼容环境中测试它们。

使用 .NET 正则表达式解析引号之间的文本

Parsing text between quotes with .NET regular expressions

.net

c#

regex

lookahead

lookbehind