PHP:在括号外引爆逗号
PHP: Explode comma outside of brackets
下面是我尝试仅在第一组括号外的逗号处展开的字符串。
Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed)), Soya Flour
第一次尝试
preg_split("/[\[\]|()]+/", "Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed)), Soya Flour", -1, PREG_SPLIT_NO_EMPTY);
哪个returns:
[0] => Wheat Flour
[1] => 2%
[2] => Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin
[3] => B3
[4] => , Thiamin
[5] => B1
[6] => , Ascorbic Acid
[7] => , Water, Yeast, Salt, Vegetable Oils
[8] => Palm, Rapeseed
[9] => , Soya Flour
第二次尝试
preg_split('/\|(?![^(]*\))/', "Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed)), Soya Flour");
Returns:
[0] => Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed), Soya Flour
第一次尝试是我能够获得的最接近以下输出的尝试。
[0] => "Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid]"
[1] => "Water"
[2] => "Yeast"
[3] => "Salt"
[4] => "Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed))"
[5] => "Soya Flour"
您可以使用
$text = "Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed)), Soya Flour";
if (preg_match_all('~[^][(),\s][^][(),]*(?:\s*(?:(\[(?:[^][]++|(?1))*])|(\((?:[^()]++|(?2))*\))))*~', $text, $matches)) {
print_r($matches[0]);
}
参见regex demo and the PHP demo。
详情:
[^][(),\s]
- 除了方括号、圆括号、逗号和空格之外的字符
[^][(),]*
- 除了方括号和圆括号和逗号之外的零个或多个字符
(?:
- 非捕获组:
\s*
- 零个或多个空格
(?:
-
(\[(?:[^][]++|(?1))*])
- [...]
子串嵌套 [...]
|
- 或
(\((?:[^()]++|(?2))*\))
- 一个 (...)
子字符串,里面有任何嵌套的括号
)*
- 可选序列,零次或多次。
您可以使用此 PCRE 正则表达式进行拆分:
(?:(\((?:[^()]*|(?-1))*\))|(\[(?:[^][]*|(?-1))*\]))(*SKIP)(*F)|\h*,\h*
代码:
$s = 'Wheat Flour [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed)), Soya Flour';
$re = '~(?:(\((?:[^()]*|(?-1))*\))|(\[(?:[^][]*|(?-1))*\]))(*SKIP)(*F)|\h*,\h*~';
print_r(preg_split($re, $s));
输出:
Array
(
[0] => Wheat Flour [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid]
[1] => Water
[2] => Yeast
[3] => Salt
[4] => Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed))
[5] => Soya Flour
)
正则表达式解释:
(?:
: 启动非捕获组
(\((?:[^()]*|(?-1))*\))
:匹配可能嵌套的 (...)
子字符串的递归模式
|
: 或
(\[(?:[^][]*|(?-1))*\])
:匹配可能嵌套的 [...]
子串的递归模式
)
:
(*SKIP)(*F)
:跳过并失败此匹配,即在拆分结果中保留此数据
|
: 或
\h*,\h*
: 匹配两边被0个或多个空格包围的逗号
下面是我尝试仅在第一组括号外的逗号处展开的字符串。
Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed)), Soya Flour
第一次尝试
preg_split("/[\[\]|()]+/", "Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed)), Soya Flour", -1, PREG_SPLIT_NO_EMPTY);
哪个returns:
[0] => Wheat Flour
[1] => 2%
[2] => Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin
[3] => B3
[4] => , Thiamin
[5] => B1
[6] => , Ascorbic Acid
[7] => , Water, Yeast, Salt, Vegetable Oils
[8] => Palm, Rapeseed
[9] => , Soya Flour
第二次尝试
preg_split('/\|(?![^(]*\))/', "Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed)), Soya Flour");
Returns:
[0] => Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed), Soya Flour
第一次尝试是我能够获得的最接近以下输出的尝试。
[0] => "Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid]"
[1] => "Water"
[2] => "Yeast"
[3] => "Salt"
[4] => "Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed))"
[5] => "Soya Flour"
您可以使用
$text = "Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed)), Soya Flour";
if (preg_match_all('~[^][(),\s][^][(),]*(?:\s*(?:(\[(?:[^][]++|(?1))*])|(\((?:[^()]++|(?2))*\))))*~', $text, $matches)) {
print_r($matches[0]);
}
参见regex demo and the PHP demo。
详情:
[^][(),\s]
- 除了方括号、圆括号、逗号和空格之外的字符[^][(),]*
- 除了方括号和圆括号和逗号之外的零个或多个字符(?:
- 非捕获组:\s*
- 零个或多个空格(?:
-(\[(?:[^][]++|(?1))*])
-[...]
子串嵌套[...]
|
- 或(\((?:[^()]++|(?2))*\))
- 一个(...)
子字符串,里面有任何嵌套的括号
)*
- 可选序列,零次或多次。
您可以使用此 PCRE 正则表达式进行拆分:
(?:(\((?:[^()]*|(?-1))*\))|(\[(?:[^][]*|(?-1))*\]))(*SKIP)(*F)|\h*,\h*
代码:
$s = 'Wheat Flour [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed)), Soya Flour';
$re = '~(?:(\((?:[^()]*|(?-1))*\))|(\[(?:[^][]*|(?-1))*\]))(*SKIP)(*F)|\h*,\h*~';
print_r(preg_split($re, $s));
输出:
Array
(
[0] => Wheat Flour [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid]
[1] => Water
[2] => Yeast
[3] => Salt
[4] => Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed))
[5] => Soya Flour
)
正则表达式解释:
(?:
: 启动非捕获组(\((?:[^()]*|(?-1))*\))
:匹配可能嵌套的(...)
子字符串的递归模式|
: 或(\[(?:[^][]*|(?-1))*\])
:匹配可能嵌套的[...]
子串的递归模式
)
:(*SKIP)(*F)
:跳过并失败此匹配,即在拆分结果中保留此数据|
: 或\h*,\h*
: 匹配两边被0个或多个空格包围的逗号