将字符串分解为由空格分隔的单词，忽略带引号的字符串中的空格，并将 ( 和 ) 视为单词

Question

如何分解下面的字符串：

+test +word any -sample (+toto +titi "generic test") -column:"test this" (+data id:1234)

进入

Array('+test', '+word', 'any', '-sample', '(', '+toto', '+titi', '"generic test"', ')', '-column:"test this"', '(', '+data', 'id:1234', ')')

我想扩展布尔全文搜索 SQL 查询，添加使用符号 column:value 或 column:"valueA value B".

指定特定列的功能

如何使用 preg_match_all($regexp, $query, $result) 执行此操作，即要使用的正确正则表达式是什么？

或者更一般地说，为了定义单词，将字符串分解为不包含空格的单词的最合适的正则表达式是什么，其中引号之间的文本中的空格不被视为空格，并且 ( 和 ) 被认为是单词，独立于被空格包围。例如 xxx"yyy zzz" 应该被认为是一个单一的世界。而(aaa)应该是三个词(、aaa和).

我已经尝试过 /"(?:\\.|[^\\"])*"|\S+/，但 limited/no 成功了。

有人可以帮忙吗？

Answer 1

我认为 PCRE 动词可以用来实现你的目标：

preg_split('/".*?"(*SKIP)(*FAIL)|(\(|\))| /', '+test +word any -sampe (+toto +titi "generic test") -column:"test this" (+data id:1234)',-1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY)

https://3v4l.org/QnpB9
https://regex101.com/r/pw1mEd/1
https://3v4l.org/dNMkf（附测试数据）

Answer 2

如果您想使用交替来匹配各个部分：

(?:[^\s()":]*:)?"[^"]+"|[^\s()]+|[()]

说明

(?: 非捕获组作为一个整体匹配
- [^\s()":]*: 匹配除 ( ) " : 之外的可选非空白字符，然后匹配 :
)? 关闭非捕获组并使其可选
"[^"]+" 匹配从左双引号到右双引号
| 或
[^\s()]+ 匹配除 ( 或 )
| 或
[()] 匹配 ( 或 )

Regex demo | PHP demo

示例代码

$re = '/(?:[^\s()":]*:)?"[^"]+"|[^\s()]+|[()]/m';
$str = '+test +word any -sampe (+toto +titi "generic test") -column:"test this" (+data id:1234)';
preg_match_all($re, $str, $matches);
print_r($matches[0]);

输出

Array
(
    [0] => +test
    [1] => +word
    [2] => any
    [3] => -sampe
    [4] => (
    [5] => +toto
    [6] => +titi
    [7] => "generic test"
    [8] => )
    [9] => -column:"test this"
    [10] => (
    [11] => +data
    [12] => id:1234
    [13] => )
)

将字符串分解为由空格分隔的单词，忽略带引号的字符串中的空格，并将 ( 和 ) 视为单词

Decomposing a string into words separared by spaces, ignoring spaces within quoted strings, and considering ( and ) as words

php

regex

matching