使用 PCRE 正则表达式解析多行 ini 文件

Parsing multiline ini like file using PCRE regex

我有一个类似 ini 的文件,其中包含 <key> = <value> 项的列表。使事情复杂化的是一些值是多行的并且可以包含 = 字符(tls 私钥)。 示例:

groupid = foo
location = westus
randomkey = fbae3700c34cb06c
resourcename = example4-resourcegroup
tls_private_key = -----BEGIN RSA PRIVATE KEY-----
//stuff
-----END RSA PRIVATE KEY-----

foo = 123
faa = 223

到目前为止,我所拥有的模式是 /^(.*?)\ \=\ (.*[^=]*)$/m,它适用于除 tls_private_key 之外的所有键,因为它包含 =,所以它只获取部分值。

有什么建议吗?

您可以先行使用此正则表达式:

^\h*(?<key>[\w-]+)\h*=\h*(?<value>[\s\S]*?)(?=\R\h*[\w-]+\h*=|\z)

RegEx Demo

正则表达式详细信息:

  • ^ 开始一行
  • \h*: 0 个或多个水平空格
  • (?<key>[\w-]+):匹配 1+ 个单词字符或连字符的组 key
  • \h*: 0 个或多个水平空格
  • =:匹配一个=
  • \h*: 0 个或多个水平空格
  • (?<value>[\s\S]*?):组 value 匹配 0 个或多个任意字符 包括换行符
  • (?=\R\h*[\w-]+\h*=|\z):先行断言在下一个位置我们有一个换行符后跟键和 = 或者输入结束

您可以匹配多行的所有值,断言下一行不包含 space 等号 space:

^(.*?) = (.*(?:\R(?!.*? = ).*)*)

Regex demo

如果key不能有spaces:

^([^\s=]+)\h+=\h+(.*(?:\R(?![^\s=]+\h+=\h+).*)*)$

说明

  • ^ 字符串开头
  • ([^\s=]+) 捕获 组 1,匹配除 = 以外的 1+ 个字符或白色 space 字符
  • \h+=\h+ 在 spaces
  • 之间匹配一个 =
  • ( 捕获 第 2 组
    • .*匹配整行
    • (?:\R(?![^\s=]+\h+=\h+).*)* 重复以下不包含 space = space
    • 的所有行
  • ) 关闭捕获组 2
  • $ 字符串结束

Regex demo

另一种变体:

(?sm)^([^=\n]*)\s=\s(.*?)(?=\n[^=\n]*\s=\s|\z)

proof

说明

--------------------------------------------------------------------------------
  (?ms)                    set flags for this block (with ^ and $
                           matching start and end of line) (with .
                           matching \n) (case-sensitive) (matching
                           whitespace and # normally)
--------------------------------------------------------------------------------
  ^                        the beginning of a "line"
--------------------------------------------------------------------------------
  (                        group and capture to :
--------------------------------------------------------------------------------
    [^=\n]*                  any character except: '=', '\n'
                             (newline) (0 or more times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
  )                        end of 
--------------------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
  =                        '='
--------------------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
  (                        group and capture to :
--------------------------------------------------------------------------------
    .*?                      any character (0 or more times (matching
                             the least amount possible))
--------------------------------------------------------------------------------
  )                        end of 
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    \n                       '\n' (newline)
--------------------------------------------------------------------------------
    [^=\n]*                  any character except: '=', '\n'
                             (newline) (0 or more times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
    =                        '='
--------------------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    \z                       the end of the string
--------------------------------------------------------------------------------
  )                        end of look-ahead