使用 PCRE 正则表达式解析多行 ini 文件
Parsing multiline ini like file using PCRE regex
我有一个类似 ini 的文件,其中包含 <key> = <value>
项的列表。使事情复杂化的是一些值是多行的并且可以包含 =
字符(tls 私钥)。
示例:
groupid = foo
location = westus
randomkey = fbae3700c34cb06c
resourcename = example4-resourcegroup
tls_private_key = -----BEGIN RSA PRIVATE KEY-----
//stuff
-----END RSA PRIVATE KEY-----
foo = 123
faa = 223
到目前为止,我所拥有的模式是 /^(.*?)\ \=\ (.*[^=]*)$/m
,它适用于除 tls_private_key 之外的所有键,因为它包含 =
,所以它只获取部分值。
有什么建议吗?
您可以先行使用此正则表达式:
^\h*(?<key>[\w-]+)\h*=\h*(?<value>[\s\S]*?)(?=\R\h*[\w-]+\h*=|\z)
正则表达式详细信息:
^
开始一行
\h*
: 0 个或多个水平空格
(?<key>[\w-]+)
:匹配 1+ 个单词字符或连字符的组 key
\h*
: 0 个或多个水平空格
=
:匹配一个=
\h*
: 0 个或多个水平空格
(?<value>[\s\S]*?)
:组 value
匹配 0 个或多个任意字符 包括换行符
(?=\R\h*[\w-]+\h*=|\z)
:先行断言在下一个位置我们有一个换行符后跟键和 =
或者输入结束
您可以匹配多行的所有值,断言下一行不包含 space 等号 space:
^(.*?) = (.*(?:\R(?!.*? = ).*)*)
如果key不能有spaces:
^([^\s=]+)\h+=\h+(.*(?:\R(?![^\s=]+\h+=\h+).*)*)$
说明
^
字符串开头
([^\s=]+)
捕获 组 1,匹配除 =
以外的 1+ 个字符或白色 space 字符
\h+=\h+
在 spaces 之间匹配一个 =
(
捕获 第 2 组
.*
匹配整行
(?:\R(?![^\s=]+\h+=\h+).*)*
重复以下不包含 space = space 的所有行
)
关闭捕获组 2
$
字符串结束
另一种变体:
(?sm)^([^=\n]*)\s=\s(.*?)(?=\n[^=\n]*\s=\s|\z)
说明
--------------------------------------------------------------------------------
(?ms) set flags for this block (with ^ and $
matching start and end of line) (with .
matching \n) (case-sensitive) (matching
whitespace and # normally)
--------------------------------------------------------------------------------
^ the beginning of a "line"
--------------------------------------------------------------------------------
( group and capture to :
--------------------------------------------------------------------------------
[^=\n]* any character except: '=', '\n'
(newline) (0 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
) end of
--------------------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
= '='
--------------------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
( group and capture to :
--------------------------------------------------------------------------------
.*? any character (0 or more times (matching
the least amount possible))
--------------------------------------------------------------------------------
) end of
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
\n '\n' (newline)
--------------------------------------------------------------------------------
[^=\n]* any character except: '=', '\n'
(newline) (0 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
= '='
--------------------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\z the end of the string
--------------------------------------------------------------------------------
) end of look-ahead
我有一个类似 ini 的文件,其中包含 <key> = <value>
项的列表。使事情复杂化的是一些值是多行的并且可以包含 =
字符(tls 私钥)。
示例:
groupid = foo
location = westus
randomkey = fbae3700c34cb06c
resourcename = example4-resourcegroup
tls_private_key = -----BEGIN RSA PRIVATE KEY-----
//stuff
-----END RSA PRIVATE KEY-----
foo = 123
faa = 223
到目前为止,我所拥有的模式是 /^(.*?)\ \=\ (.*[^=]*)$/m
,它适用于除 tls_private_key 之外的所有键,因为它包含 =
,所以它只获取部分值。
有什么建议吗?
您可以先行使用此正则表达式:
^\h*(?<key>[\w-]+)\h*=\h*(?<value>[\s\S]*?)(?=\R\h*[\w-]+\h*=|\z)
正则表达式详细信息:
^
开始一行\h*
: 0 个或多个水平空格(?<key>[\w-]+)
:匹配 1+ 个单词字符或连字符的组key
\h*
: 0 个或多个水平空格=
:匹配一个=
\h*
: 0 个或多个水平空格(?<value>[\s\S]*?)
:组value
匹配 0 个或多个任意字符 包括换行符(?=\R\h*[\w-]+\h*=|\z)
:先行断言在下一个位置我们有一个换行符后跟键和=
或者输入结束
您可以匹配多行的所有值,断言下一行不包含 space 等号 space:
^(.*?) = (.*(?:\R(?!.*? = ).*)*)
如果key不能有spaces:
^([^\s=]+)\h+=\h+(.*(?:\R(?![^\s=]+\h+=\h+).*)*)$
说明
^
字符串开头([^\s=]+)
捕获 组 1,匹配除=
以外的 1+ 个字符或白色 space 字符\h+=\h+
在 spaces 之间匹配一个 (
捕获 第 2 组.*
匹配整行(?:\R(?![^\s=]+\h+=\h+).*)*
重复以下不包含 space = space 的所有行
)
关闭捕获组 2$
字符串结束
=
另一种变体:
(?sm)^([^=\n]*)\s=\s(.*?)(?=\n[^=\n]*\s=\s|\z)
说明
--------------------------------------------------------------------------------
(?ms) set flags for this block (with ^ and $
matching start and end of line) (with .
matching \n) (case-sensitive) (matching
whitespace and # normally)
--------------------------------------------------------------------------------
^ the beginning of a "line"
--------------------------------------------------------------------------------
( group and capture to :
--------------------------------------------------------------------------------
[^=\n]* any character except: '=', '\n'
(newline) (0 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
) end of
--------------------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
= '='
--------------------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
( group and capture to :
--------------------------------------------------------------------------------
.*? any character (0 or more times (matching
the least amount possible))
--------------------------------------------------------------------------------
) end of
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
\n '\n' (newline)
--------------------------------------------------------------------------------
[^=\n]* any character except: '=', '\n'
(newline) (0 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
= '='
--------------------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\z the end of the string
--------------------------------------------------------------------------------
) end of look-ahead