按空格和冒号拆分字符串,但如果在引号内则不拆分
split string by spaces and colon but not if inside quotes
有这样的字符串:
$str = "dateto:'2015-10-07 15:05' xxxx datefrom:'2015-10-09 15:05' yyyy asdf"
期望的结果是:
[0] => Array (
[0] => dateto:'2015-10-07 15:05'
[1] => xxxx
[2] => datefrom:'2015-10-09 15:05'
[3] => yyyy
[4] => asdf
)
我得到的:
preg_match_all("/\'(?:[^()]|(?R))+\'|'[^']*'|[^(),\s]+/", $str, $m);
是:
[0] => Array (
[0] => dateto:'2015-10-07
[1] => 15:05'
[2] => xxxx
[3] => datefrom:'2015-10-09
[4] => 15:05'
[5] => yyyy
[6] => asdf
)
也尝试过 preg_split("/[\s]+/", $str)
但不知道如果值在引号之间如何转义。任何人都可以告诉我如何并请解释正则表达式。谢谢!
我会使用 PCRE 动词 (*SKIP)(*F)
,
preg_split("~'[^']*'(*SKIP)(*F)|\s+~", $str);
对于您的示例,您可以使用 preg_split with negative lookbehind (?<!\d)
,即:
<?php
$str = "dateto:'2015-10-07 15:05' xxxx datefrom:'2015-10-09 15:05' yyyy asdf";
$matches = preg_split('/(?<!\d)(\s)/', $str);
print_r($matches);
输出:
Array
(
[0] => dateto:'2015-10-07 15:05'
[1] => xxxx
[2] => datefrom:'2015-10-09 15:05'
[3] => yyyy
[4] => asdf
)
演示:
正则表达式解释:
(?<!\d)(\s)
Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<!\d)»
Match a single character that is a “digit” «\d»
Match the regex below and capture its match into backreference number 1 «(\s)»
Match a single character that is a “whitespace character” «\s»
通常,当您要拆分字符串时,使用 preg_split
并不是最好的方法(这似乎有点违反直觉,但大多数时候都是如此)。一种更有效的方法是使用描述所有非定界符(此处为空白)的模式查找所有项目(preg_match_all
):
$pattern = <<<'EOD'
~(?=\S)[^'"\s]*(?:'[^']*'[^'"\s]*|"[^"]*"[^'"\s]*)*~
EOD;
if (preg_match_all($pattern, $str, $m))
$result = $m[0];
图案详情:
~ # pattern delimiter
(?=\S) # the lookahead assertion only succeeds if there is a non-
# white-space character at the current position.
# (This lookahead is useful for two reasons:
# - it allows the regex engine to quickly find the start of
# the next item without to have to test each branch of the
# following alternation at each position in the strings
# until one succeeds.
# - it ensures that there's at least one non-white-space.
# Without it, the pattern may match an empty string.
# )
[^'"\s]* #"'# all that is not a quote or a white-space
(?: # eventual quoted parts
'[^']*' [^'"\s]* #"# single quotes
|
"[^"]*" [^'"\s]* # double quotes
)*
~
请注意,使用这个有点长的模式,仅需 60 个步骤即可找到示例字符串的五个项目。您也可以使用这个 shorter/more 简单模式:
~(?:[^'"\s]+|'[^']*'|"[^"]*")+~
但是效率有点低
有这样的字符串:
$str = "dateto:'2015-10-07 15:05' xxxx datefrom:'2015-10-09 15:05' yyyy asdf"
期望的结果是:
[0] => Array (
[0] => dateto:'2015-10-07 15:05'
[1] => xxxx
[2] => datefrom:'2015-10-09 15:05'
[3] => yyyy
[4] => asdf
)
我得到的:
preg_match_all("/\'(?:[^()]|(?R))+\'|'[^']*'|[^(),\s]+/", $str, $m);
是:
[0] => Array (
[0] => dateto:'2015-10-07
[1] => 15:05'
[2] => xxxx
[3] => datefrom:'2015-10-09
[4] => 15:05'
[5] => yyyy
[6] => asdf
)
也尝试过 preg_split("/[\s]+/", $str)
但不知道如果值在引号之间如何转义。任何人都可以告诉我如何并请解释正则表达式。谢谢!
我会使用 PCRE 动词 (*SKIP)(*F)
,
preg_split("~'[^']*'(*SKIP)(*F)|\s+~", $str);
对于您的示例,您可以使用 preg_split with negative lookbehind (?<!\d)
,即:
<?php
$str = "dateto:'2015-10-07 15:05' xxxx datefrom:'2015-10-09 15:05' yyyy asdf";
$matches = preg_split('/(?<!\d)(\s)/', $str);
print_r($matches);
输出:
Array
(
[0] => dateto:'2015-10-07 15:05'
[1] => xxxx
[2] => datefrom:'2015-10-09 15:05'
[3] => yyyy
[4] => asdf
)
演示:
正则表达式解释:
(?<!\d)(\s)
Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<!\d)»
Match a single character that is a “digit” «\d»
Match the regex below and capture its match into backreference number 1 «(\s)»
Match a single character that is a “whitespace character” «\s»
通常,当您要拆分字符串时,使用 preg_split
并不是最好的方法(这似乎有点违反直觉,但大多数时候都是如此)。一种更有效的方法是使用描述所有非定界符(此处为空白)的模式查找所有项目(preg_match_all
):
$pattern = <<<'EOD'
~(?=\S)[^'"\s]*(?:'[^']*'[^'"\s]*|"[^"]*"[^'"\s]*)*~
EOD;
if (preg_match_all($pattern, $str, $m))
$result = $m[0];
图案详情:
~ # pattern delimiter
(?=\S) # the lookahead assertion only succeeds if there is a non-
# white-space character at the current position.
# (This lookahead is useful for two reasons:
# - it allows the regex engine to quickly find the start of
# the next item without to have to test each branch of the
# following alternation at each position in the strings
# until one succeeds.
# - it ensures that there's at least one non-white-space.
# Without it, the pattern may match an empty string.
# )
[^'"\s]* #"'# all that is not a quote or a white-space
(?: # eventual quoted parts
'[^']*' [^'"\s]* #"# single quotes
|
"[^"]*" [^'"\s]* # double quotes
)*
~
请注意,使用这个有点长的模式,仅需 60 个步骤即可找到示例字符串的五个项目。您也可以使用这个 shorter/more 简单模式:
~(?:[^'"\s]+|'[^']*'|"[^"]*")+~
但是效率有点低