使用 PCRE 正则表达式解析多行 ini 文件

Question

我有一个类似 ini 的文件，其中包含 <key> = <value> 项的列表。使事情复杂化的是一些值是多行的并且可以包含 = 字符（tls 私钥）。示例：

groupid = foo
location = westus
randomkey = fbae3700c34cb06c
resourcename = example4-resourcegroup
tls_private_key = -----BEGIN RSA PRIVATE KEY-----
//stuff
-----END RSA PRIVATE KEY-----

foo = 123
faa = 223

到目前为止，我所拥有的模式是 /^(.*?)\ \=\ (.*[^=]*)$/m，它适用于除 tls_private_key 之外的所有键，因为它包含 =，所以它只获取部分值。

有什么建议吗？

Answer 1

您可以先行使用此正则表达式：

^\h*(?<key>[\w-]+)\h*=\h*(?<value>[\s\S]*?)(?=\R\h*[\w-]+\h*=|\z)

RegEx Demo

正则表达式详细信息：

^ 开始一行
\h*: 0 个或多个水平空格
(?<key>[\w-]+)：匹配 1+ 个单词字符或连字符的组 key
\h*: 0 个或多个水平空格
=：匹配一个=
\h*: 0 个或多个水平空格
(?<value>[\s\S]*?)：组 value 匹配 0 个或多个任意字符 包括换行符
(?=\R\h*[\w-]+\h*=|\z)：先行断言在下一个位置我们有一个换行符后跟键和 = 或者输入结束

Answer 2

您可以匹配多行的所有值，断言下一行不包含 space 等号 space:

^(.*?) = (.*(?:\R(?!.*? = ).*)*)

Regex demo

如果key不能有spaces:

^([^\s=]+)\h+=\h+(.*(?:\R(?![^\s=]+\h+=\h+).*)*)$

说明

^ 字符串开头
([^\s=]+) 捕获 组 1，匹配除 = 以外的 1+ 个字符或白色 space 字符
\h+=\h+ 在 spaces

=

( 捕获 第 2 组
- .*匹配整行
- (?:\R(?![^\s=]+\h+=\h+).*)* 重复以下不包含 space = space
) 关闭捕获组 2
$ 字符串结束

Regex demo

Answer 3

另一种变体：

(?sm)^([^=\n]*)\s=\s(.*?)(?=\n[^=\n]*\s=\s|\z)

见proof

说明

--------------------------------------------------------------------------------
  (?ms)                    set flags for this block (with ^ and $
                           matching start and end of line) (with .
                           matching \n) (case-sensitive) (matching
                           whitespace and # normally)
--------------------------------------------------------------------------------
  ^                        the beginning of a "line"
--------------------------------------------------------------------------------
  (                        group and capture to :
--------------------------------------------------------------------------------
    [^=\n]*                  any character except: '=', '\n'
                             (newline) (0 or more times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
  )                        end of 
--------------------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
  =                        '='
--------------------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
  (                        group and capture to :
--------------------------------------------------------------------------------
    .*?                      any character (0 or more times (matching
                             the least amount possible))
--------------------------------------------------------------------------------
  )                        end of 
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    \n                       '\n' (newline)
--------------------------------------------------------------------------------
    [^=\n]*                  any character except: '=', '\n'
                             (newline) (0 or more times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
    =                        '='
--------------------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    \z                       the end of the string
--------------------------------------------------------------------------------
  )                        end of look-ahead

使用 PCRE 正则表达式解析多行 ini 文件

Parsing multiline ini like file using PCRE regex

php

regex

pcre