PHP:如何用破折号和括号之间的所有内容拆分字符串。 (preg_split 或 preg_match)
PHP: How to split a string by dash and everything between brackets. (preg_split or preg_match)
几天来我一直在思考这个问题,但似乎没有得到想要的结果。
示例:
$var = "Some Words - Other Words (More Words) Dash-Binded-Word";
想要的结果:
array(
[0] => Some Words
[1] => Other Words
[2] => More Words
[3] => Dash-Bound-Word
)
我可以使用 preg_match_all 让这一切正常工作,但 "Dash-Bound-Word" 也被打破了。尝试将它与周围的空格匹配是行不通的,因为它会破坏除破折号之外的所有单词。
我使用的 preg_match_all 语句(它也打破了破折号绑定的单词)是这样的:
preg_match_all('#\(.*?\)|\[.*?\]|[^?!\-|\(|\[]+#', $var, $array);
我当然不是 preg_match、preg_split 方面的专家,因此我们将不胜感激。
您可以拆分为:
/\s*(?<!\w(?=.\w))[\-[\]()]\s*/
解释:
- 尝试匹配字符 class
[\-[\]()]
(匹配任何这些字符)。您还可以向该字符添加任何字符 class.
- 它正在对以下条件使用负面回顾
(?<!\w)
:"not preceded by a word character"。
- 它还有一个嵌套的前瞻性
(?=.\w)
检查:"if the first condition is met, it shouldn't be followed by any char -the one used to split- and a word character".
\s*
开头和结尾是trim个空格。
代码:
$input_line = "Some Words - Other Words (More Words) Dash-Binded-Word";
$result = preg_split("/\s*(?<!\w(?=.\w))[\-[\]()]\s*/", $input_line);
var_dump($result);
输出:
array(4) {
[0]=>
string(10) "Some Words"
[1]=>
string(11) "Other Words"
[2]=>
string(10) "More Words"
[3]=>
string(16) "Dash-Binded-Word"
}
捕获parens
如另一条评论所述,如果您还想捕获括号:
$result = preg_split("/\s*(?:(?<!\w)-(?!\w)|(\(.*?\)|\[.*?]))\s*/", $input_line, -1, PREG_SPLIT_DELIM_CAPTURE);
试试这个(str_replace 和爆炸的组合)。它不是最佳的,但可能适用于这种情况:
$var = "Some Words - Other Words (More Words) Dash-Binded-Word";
$arr = Array(" - ", " (", ") ");
$var2 = str_replace($arr, "|", $var);
$final = explode('|', $var2);
var_dump($final);
输出:
array(4) { [0]=> string(10) "Some Words" [1]=> string(11) "Other
Words" [2]=> string(10) "More Words" [3]=> string(16)
"Dash-Binded-Word" }
$var = "Some Words - Other Words (More Words) Dash-Binded-Word";
$var=preg_replace('/[^A-Za-z\-]/', ' ', $var);
$var=str_replace('-', ' ', $var); // Replaces all hyphens with spaces.
print_r (explode(" ",preg_replace('!\s+!', ' ', $var))); //replaces all multiple spaces with one and explode creates array split where there is space
输出:-
Array ( [0] => Some [1] => Words [2] => Other [3] => Words [4] => More [5] => Words [6] => Dash [7] => Binded [8] => Word )
您可以使用简单的 preg_match_all
:
\w+(?:[- ]\w+)*
见demo
\w+
- 1 个或多个字母数字或下划线
(?:[- ]\w+)*
- 0 个或多个......
[- ]
- 连字符或 space(您可以将 space 更改为 \s
以匹配任何白色 space)
\w+
- 1 个或多个字母数字或下划线
$re = '/\w+(?:[- ]\w+)*/';
$str = "Some Words - Other Words (More Words) Dash-Binded-Word";
preg_match_all($re, $str, $matches);
print_r($matches[0]);
结果:
Array
(
[0] => Some Words
[1] => Other Words
[2] => More Words
[3] => Dash-Binded-Word
)
修改输入字符串以适应任何特定的爆炸技术将是间接的,并且表明正在使用次优的爆炸技术。
事实是,您所需的逻辑可以归结为:“在每个长度为 2 或更多的非单词字符序列上展开 ”。这就是 preg_split()
.
模式的样子
代码:(Demo)
$var = "Some Words - Other Words (More Words) Dash-Binded-Word";
var_export(preg_split('~\W{2,}~', $var));
输出:
array (
0 => 'Some Words',
1 => 'Other Words',
2 => 'More Words',
3 => 'Dash-Binded-Word',
)
没有比这更简单的了。
几天来我一直在思考这个问题,但似乎没有得到想要的结果。
示例:
$var = "Some Words - Other Words (More Words) Dash-Binded-Word";
想要的结果:
array(
[0] => Some Words
[1] => Other Words
[2] => More Words
[3] => Dash-Bound-Word
)
我可以使用 preg_match_all 让这一切正常工作,但 "Dash-Bound-Word" 也被打破了。尝试将它与周围的空格匹配是行不通的,因为它会破坏除破折号之外的所有单词。
我使用的 preg_match_all 语句(它也打破了破折号绑定的单词)是这样的:
preg_match_all('#\(.*?\)|\[.*?\]|[^?!\-|\(|\[]+#', $var, $array);
我当然不是 preg_match、preg_split 方面的专家,因此我们将不胜感激。
您可以拆分为:
/\s*(?<!\w(?=.\w))[\-[\]()]\s*/
解释:
- 尝试匹配字符 class
[\-[\]()]
(匹配任何这些字符)。您还可以向该字符添加任何字符 class. - 它正在对以下条件使用负面回顾
(?<!\w)
:"not preceded by a word character"。 - 它还有一个嵌套的前瞻性
(?=.\w)
检查:"if the first condition is met, it shouldn't be followed by any char -the one used to split- and a word character". \s*
开头和结尾是trim个空格。
代码:
$input_line = "Some Words - Other Words (More Words) Dash-Binded-Word";
$result = preg_split("/\s*(?<!\w(?=.\w))[\-[\]()]\s*/", $input_line);
var_dump($result);
输出:
array(4) {
[0]=>
string(10) "Some Words"
[1]=>
string(11) "Other Words"
[2]=>
string(10) "More Words"
[3]=>
string(16) "Dash-Binded-Word"
}
捕获parens
如另一条评论所述,如果您还想捕获括号:
$result = preg_split("/\s*(?:(?<!\w)-(?!\w)|(\(.*?\)|\[.*?]))\s*/", $input_line, -1, PREG_SPLIT_DELIM_CAPTURE);
试试这个(str_replace 和爆炸的组合)。它不是最佳的,但可能适用于这种情况:
$var = "Some Words - Other Words (More Words) Dash-Binded-Word";
$arr = Array(" - ", " (", ") ");
$var2 = str_replace($arr, "|", $var);
$final = explode('|', $var2);
var_dump($final);
输出:
array(4) { [0]=> string(10) "Some Words" [1]=> string(11) "Other Words" [2]=> string(10) "More Words" [3]=> string(16) "Dash-Binded-Word" }
$var = "Some Words - Other Words (More Words) Dash-Binded-Word";
$var=preg_replace('/[^A-Za-z\-]/', ' ', $var);
$var=str_replace('-', ' ', $var); // Replaces all hyphens with spaces.
print_r (explode(" ",preg_replace('!\s+!', ' ', $var))); //replaces all multiple spaces with one and explode creates array split where there is space
输出:-
Array ( [0] => Some [1] => Words [2] => Other [3] => Words [4] => More [5] => Words [6] => Dash [7] => Binded [8] => Word )
您可以使用简单的 preg_match_all
:
\w+(?:[- ]\w+)*
见demo
\w+
- 1 个或多个字母数字或下划线(?:[- ]\w+)*
- 0 个或多个......[- ]
- 连字符或 space(您可以将 space 更改为\s
以匹配任何白色 space)\w+
- 1 个或多个字母数字或下划线
$re = '/\w+(?:[- ]\w+)*/';
$str = "Some Words - Other Words (More Words) Dash-Binded-Word";
preg_match_all($re, $str, $matches);
print_r($matches[0]);
结果:
Array
(
[0] => Some Words
[1] => Other Words
[2] => More Words
[3] => Dash-Binded-Word
)
修改输入字符串以适应任何特定的爆炸技术将是间接的,并且表明正在使用次优的爆炸技术。
事实是,您所需的逻辑可以归结为:“在每个长度为 2 或更多的非单词字符序列上展开 ”。这就是 preg_split()
.
代码:(Demo)
$var = "Some Words - Other Words (More Words) Dash-Binded-Word";
var_export(preg_split('~\W{2,}~', $var));
输出:
array (
0 => 'Some Words',
1 => 'Other Words',
2 => 'More Words',
3 => 'Dash-Binded-Word',
)
没有比这更简单的了。