按空格和冒号拆分字符串，但如果在引号内则不拆分

Question

有这样的字符串：

$str = "dateto:'2015-10-07 15:05' xxxx datefrom:'2015-10-09 15:05' yyyy asdf"

期望的结果是：

[0] => Array (
    [0] => dateto:'2015-10-07 15:05'
    [1] => xxxx
    [2] => datefrom:'2015-10-09 15:05'
    [3] => yyyy
    [4] => asdf
)

我得到的：

preg_match_all("/\'(?:[^()]|(?R))+\'|'[^']*'|[^(),\s]+/", $str, $m);

是：

[0] => Array (
    [0] => dateto:'2015-10-07
    [1] => 15:05'
    [2] => xxxx
    [3] => datefrom:'2015-10-09
    [4] => 15:05'
    [5] => yyyy
    [6] => asdf
)

也尝试过 preg_split("/[\s]+/", $str) 但不知道如果值在引号之间如何转义。任何人都可以告诉我如何并请解释正则表达式。谢谢！

Answer 1

我会使用 PCRE 动词 (*SKIP)(*F),

preg_split("~'[^']*'(*SKIP)(*F)|\s+~", $str);

DEMO

Answer 2

对于您的示例，您可以使用 preg_split with negative lookbehind (?<!\d)，即：

<?php
$str = "dateto:'2015-10-07 15:05' xxxx datefrom:'2015-10-09 15:05' yyyy asdf";
$matches = preg_split('/(?<!\d)(\s)/', $str);
print_r($matches);

输出：

    Array
    (
        [0] => dateto:'2015-10-07 15:05'
        [1] => xxxx
        [2] => datefrom:'2015-10-09 15:05'
        [3] => yyyy
        [4] => asdf
    )

演示：

http://ideone.com/EP06Nt

正则表达式解释：

(?<!\d)(\s)

Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<!\d)»
   Match a single character that is a “digit” «\d»
Match the regex below and capture its match into backreference number 1 «(\s)»
   Match a single character that is a “whitespace character” «\s»

Answer 3

通常，当您要拆分字符串时，使用 preg_split 并不是最好的方法（这似乎有点违反直觉，但大多数时候都是如此）。一种更有效的方法是使用描述所有非定界符（此处为空白）的模式查找所有项目（preg_match_all）：

$pattern = <<<'EOD'
~(?=\S)[^'"\s]*(?:'[^']*'[^'"\s]*|"[^"]*"[^'"\s]*)*~
EOD;

if (preg_match_all($pattern, $str, $m))
    $result = $m[0];

图案详情：

~                    # pattern delimiter

(?=\S)               # the lookahead assertion only succeeds if there is a non-
                     # white-space character at the current position.
                     # (This lookahead is useful for two reasons:
                     #    - it allows the regex engine to quickly find the start of
                     #      the next item without to have to test each branch of the
                     #      following alternation at each position in the strings
                     #      until one succeeds.
                     #    - it ensures that there's at least one non-white-space.
                     #      Without it, the pattern may match an empty string.
                     # )

[^'"\s]*          #"'# all that is not a quote or a white-space

(?:                  # eventual quoted parts
    '[^']*' [^'"\s]*  #"# single quotes
  |
    "[^"]*" [^'"\s]*    # double quotes
)*
~

demo

请注意，使用这个有点长的模式，仅需 60 个步骤即可找到示例字符串的五个项目。您也可以使用这个 shorter/more 简单模式：

~(?:[^'"\s]+|'[^']*'|"[^"]*")+~

但是效率有点低

按空格和冒号拆分字符串，但如果在引号内则不拆分

split string by spaces and colon but not if inside quotes

php

regex

preg-match-all

preg-match

preg-split