PHP:在括号外引爆逗号

PHP: Explode comma outside of brackets

下面是我尝试仅在第一组括号外的逗号处展开的字符串。

Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed)), Soya Flour

第一次尝试

preg_split("/[\[\]|()]+/", "Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed)), Soya Flour", -1, PREG_SPLIT_NO_EMPTY);

哪个returns:

[0] => Wheat Flour 
[1] => 2%
[2] => Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin 
[3] => B3
[4] => , Thiamin 
[5] => B1
[6] => , Ascorbic Acid
[7] => , Water, Yeast, Salt, Vegetable Oils 
[8] => Palm, Rapeseed
[9] => , Soya Flour

第二次尝试

preg_split('/\|(?![^(]*\))/', "Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed)), Soya Flour");

Returns:

[0] => Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed), Soya Flour

第一次尝试是我能够获得的最接近以下输出的尝试。

[0] => "Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid]"
[1] => "Water"
[2] => "Yeast"
[3] => "Salt"
[4] => "Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed))"
[5] => "Soya Flour"

您可以使用

$text = "Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed)), Soya Flour"; 
if (preg_match_all('~[^][(),\s][^][(),]*(?:\s*(?:(\[(?:[^][]++|(?1))*])|(\((?:[^()]++|(?2))*\))))*~', $text, $matches)) {
    print_r($matches[0]); 
}

参见regex demo and the PHP demo

详情:

  • [^][(),\s] - 除了方括号、圆括号、逗号和空格之外的字符
  • [^][(),]* - 除了方括号和圆括号和逗号之外的零个或多个字符
  • (?: - 非捕获组:
    • \s* - 零个或多个空格
    • (?: -
    • (\[(?:[^][]++|(?1))*]) - [...] 子串嵌套 [...]
    • | - 或
    • (\((?:[^()]++|(?2))*\)) - 一个 (...) 子字符串,里面有任何嵌套的括号
  • )* - 可选序列,零次或多次。

您可以使用此 PCRE 正则表达式进行拆分:

(?:(\((?:[^()]*|(?-1))*\))|(\[(?:[^][]*|(?-1))*\]))(*SKIP)(*F)|\h*,\h*

RegEx Demo

代码:

$s = 'Wheat Flour [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed)), Soya Flour';
$re = '~(?:(\((?:[^()]*|(?-1))*\))|(\[(?:[^][]*|(?-1))*\]))(*SKIP)(*F)|\h*,\h*~';

print_r(preg_split($re, $s));

输出:

Array
(
    [0] => Wheat Flour [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid]
    [1] => Water
    [2] => Yeast
    [3] => Salt
    [4] => Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed))
    [5] => Soya Flour
)

正则表达式解释:

  • (?:: 启动非捕获组
    • (\((?:[^()]*|(?-1))*\)):匹配可能嵌套的 (...) 子字符串的递归模式
    • |: 或
    • (\[(?:[^][]*|(?-1))*\]):匹配可能嵌套的 [...] 子串的递归模式
  • ):
  • (*SKIP)(*F):跳过并失败此匹配,即在拆分结果中保留此数据
  • |: 或
  • \h*,\h*: 匹配两边被0个或多个空格包围的逗号