从字符串中解析参数 |正则表达式 | php
Parsing parameter from string | regex | php
我在从字符串解析参数时遇到问题。
参数定义如下:
可以写成短记号或长记号,p.ex:
-a / --long
字符范围从 [a-z0-9] 简写到 [a-z0-9\-] 长写,p.ex:
--long-with-dash
可以有一个值,但不是必须的,p.ex:
-一个测试 / --aaaa
可以有多个参数,不用引号,p.ex:
-a val1 val2
(应该作为一组捕获:value = "val1 val2")
可以在引号内包含自定义文本
--自定义"here can stand everything, --test test :( "
参数可以有一个“!”在前
! --test 测试 / ! -a
值内部可以有“-”
-带破折号的值
所有这些参数都在一个长字符串中,p.ex:
-a val1 ! -b val2 --other "string with crazy -a --test stuff inside" --param-with-dash val1 val2 -test value-with-dash ! -c -d ! --test
-- 编辑 ----
还有--param value-with-dash
-- 结束编辑 ---
这是我能得到的最接近的:
https://regex101.com/r/3aPHzp/1
/(?:(?P<inverted>\!) )?(?P<names>\-{1,2}\S+)($| (?P<values>.+(?=(?: [\!|\-])|$)))/U
不幸的是,当涉及到引号内的自由文本值时,它会中断。而当一个没有值的参数后面跟着下一个参数时。
(我尝试解析 iptables-save 的输出,以防你感兴趣。另外,也许我之前可以用另一种奇特的方式拆分字符串,以避免 hugh regex,但我不这样做看到它)。
非常感谢您的帮助!
-- 最终解决方案 --
对于 PHP >= 5.6
(?<inverted>!)?\s*(?<name>--?\w[\w-]*)\s*(?<values>(?:\s*(?:\w\S*|["'](?:[^"'\]*(?:\.[^"'\]*)*)['"]))*)\K
演示:https://regex101.com/r/xSfgxP/1
对于 PHP < 5.6
(?<inverted>\!)?\s*(?<=(?:\s)|^)(?<name>\-{1,2}\w[\w\-]*)\s+(?<value>(?:\s*(?:\w\S*|["'](?:[^"'\]*(?:\.[^"'\]*)*)['"]))*)
正则表达式:
(?<inverted>!)?\s*(?<name>--?\w[\w-]*)\s*(?<values>(?:\s*(?:\w\S+|["'](?:[^"'\]*(?:\.[^"'\]*)*)['"]))*)\K
Live demo(更新)
细分
(?<inverted> ! )? # (1) Named-capturing group for inverted result
\s* # Match any spaces
(?<name> --? \w [\w-]* ) # (2) Named-capturing group for parameter name
\s* # Match any spaces
(?<values> # (3 start) Named capturing group for values
(?: # Beginning of a non-capturing group (a)
\s* # Match any spaces
(?: # Beginning of a non-capturing group (b)
\w\S+ # Match a [a-zA-Z0-9_] character then any non-whitespace characters
| # Or
["'] # Match a qoutation mark
(?: # Beginning of a non-capturing group (c)
[^"'\]* # Match anything except `"`, `'` or `\`
(?: \ . [^"'\]* )* # Match an escaped character then anyhthing except `"`, `'` or `\` as much as possible
) # End of non-capturing group (c)
['"] # Match qutation pair
) # End of non-capturing group (b)
)* # Greedy (a), end of non-capturing group (a)
) # (3 end)
\K # Reset allocated memory of all previously matched characters
PHP代码:
<?php
$str = '-a val1 ! -b val2 --custom "string :)(#with crazy -a --test stuff inside" --param-with-dash val1 val2 -c ! -d ! --test';
$re = <<< 'RE'
~(?<inverted>!)?\s*(?<name>--?\w[\w-]*)\s*(?<values>(?:\s*(?:\w\S+|["'](?:[^"'\]*(?:\.[^"'\]*)*)['"]))*)\K~
RE;
preg_match_all($re, $str, $matches, PREG_SET_ORDER);
print_r(array_map('array_filter', $matches));
输出:
Array
(
[0] => Array
(
[name] => -a
[2] => -a
[values] => val1
[3] => val1
)
[1] => Array
(
[inverted] => !
[1] => !
[name] => -b
[2] => -b
[values] => val2
[3] => val2
)
[2] => Array
(
[name] => --custom
[2] => --custom
[values] => "string :)(#with crazy -a --test stuff inside"
[3] => "string :)(#with crazy -a --test stuff inside"
)
[3] => Array
(
[name] => --param-with-dash
[2] => --param-with-dash
[values] => val1 val2
[3] => val1 val2
)
[4] => Array
(
[name] => -c
[2] => -c
)
[5] => Array
(
[inverted] => !
[1] => !
[name] => -d
[2] => -d
)
[6] => Array
(
[inverted] => !
[1] => !
[name] => --test
[2] => --test
)
)
我在从字符串解析参数时遇到问题。
参数定义如下:
可以写成短记号或长记号,p.ex: -a / --long
字符范围从 [a-z0-9] 简写到 [a-z0-9\-] 长写,p.ex: --long-with-dash
可以有一个值,但不是必须的,p.ex: -一个测试 / --aaaa
可以有多个参数,不用引号,p.ex: -a val1 val2 (应该作为一组捕获:value = "val1 val2")
可以在引号内包含自定义文本 --自定义"here can stand everything, --test test :( "
参数可以有一个“!”在前 ! --test 测试 / ! -a
值内部可以有“-” -带破折号的值
所有这些参数都在一个长字符串中,p.ex:
-a val1 ! -b val2 --other "string with crazy -a --test stuff inside" --param-with-dash val1 val2 -test value-with-dash ! -c -d ! --test
-- 编辑 ----
还有--param value-with-dash
-- 结束编辑 ---
这是我能得到的最接近的:
https://regex101.com/r/3aPHzp/1
/(?:(?P<inverted>\!) )?(?P<names>\-{1,2}\S+)($| (?P<values>.+(?=(?: [\!|\-])|$)))/U
不幸的是,当涉及到引号内的自由文本值时,它会中断。而当一个没有值的参数后面跟着下一个参数时。
(我尝试解析 iptables-save 的输出,以防你感兴趣。另外,也许我之前可以用另一种奇特的方式拆分字符串,以避免 hugh regex,但我不这样做看到它)。
非常感谢您的帮助!
-- 最终解决方案 --
对于 PHP >= 5.6
(?<inverted>!)?\s*(?<name>--?\w[\w-]*)\s*(?<values>(?:\s*(?:\w\S*|["'](?:[^"'\]*(?:\.[^"'\]*)*)['"]))*)\K
演示:https://regex101.com/r/xSfgxP/1
对于 PHP < 5.6
(?<inverted>\!)?\s*(?<=(?:\s)|^)(?<name>\-{1,2}\w[\w\-]*)\s+(?<value>(?:\s*(?:\w\S*|["'](?:[^"'\]*(?:\.[^"'\]*)*)['"]))*)
正则表达式:
(?<inverted>!)?\s*(?<name>--?\w[\w-]*)\s*(?<values>(?:\s*(?:\w\S+|["'](?:[^"'\]*(?:\.[^"'\]*)*)['"]))*)\K
Live demo(更新)
细分
(?<inverted> ! )? # (1) Named-capturing group for inverted result
\s* # Match any spaces
(?<name> --? \w [\w-]* ) # (2) Named-capturing group for parameter name
\s* # Match any spaces
(?<values> # (3 start) Named capturing group for values
(?: # Beginning of a non-capturing group (a)
\s* # Match any spaces
(?: # Beginning of a non-capturing group (b)
\w\S+ # Match a [a-zA-Z0-9_] character then any non-whitespace characters
| # Or
["'] # Match a qoutation mark
(?: # Beginning of a non-capturing group (c)
[^"'\]* # Match anything except `"`, `'` or `\`
(?: \ . [^"'\]* )* # Match an escaped character then anyhthing except `"`, `'` or `\` as much as possible
) # End of non-capturing group (c)
['"] # Match qutation pair
) # End of non-capturing group (b)
)* # Greedy (a), end of non-capturing group (a)
) # (3 end)
\K # Reset allocated memory of all previously matched characters
PHP代码:
<?php
$str = '-a val1 ! -b val2 --custom "string :)(#with crazy -a --test stuff inside" --param-with-dash val1 val2 -c ! -d ! --test';
$re = <<< 'RE'
~(?<inverted>!)?\s*(?<name>--?\w[\w-]*)\s*(?<values>(?:\s*(?:\w\S+|["'](?:[^"'\]*(?:\.[^"'\]*)*)['"]))*)\K~
RE;
preg_match_all($re, $str, $matches, PREG_SET_ORDER);
print_r(array_map('array_filter', $matches));
输出:
Array
(
[0] => Array
(
[name] => -a
[2] => -a
[values] => val1
[3] => val1
)
[1] => Array
(
[inverted] => !
[1] => !
[name] => -b
[2] => -b
[values] => val2
[3] => val2
)
[2] => Array
(
[name] => --custom
[2] => --custom
[values] => "string :)(#with crazy -a --test stuff inside"
[3] => "string :)(#with crazy -a --test stuff inside"
)
[3] => Array
(
[name] => --param-with-dash
[2] => --param-with-dash
[values] => val1 val2
[3] => val1 val2
)
[4] => Array
(
[name] => -c
[2] => -c
)
[5] => Array
(
[inverted] => !
[1] => !
[name] => -d
[2] => -d
)
[6] => Array
(
[inverted] => !
[1] => !
[name] => --test
[2] => --test
)
)