如何拆分文本以匹配双引号加上尾随文本以点?
How to split text to match double quotes plus trailing text to dot?
如何获取双引号中有一个点必须分开的句子?
示例文档如下:
“Chess helps us overcome difficulties and sufferings,” said Unnikrishnan, taking my queen. “On a chess board you are fighting. as we are also fighting the hardships in our daily life.” he said.
我想得到这样的输出:
Array
(
[0] =>"Chess helps us overcome difficulties and sufferings," said Unnikrishnan, taking my queen.
[1] =>"On a chess board you are fighting. as we are also fighting the hardships in our daily life," he said.
)
我的代码还是点点爆炸。
function sample($string)
{
$data=array();
$break=explode(".", $string);
array_push($data, $break);
print_r($data);
}
我仍然对拆分关于双引号和点的两个定界符感到困惑。因为在双引号里面有一个句子包含点分隔符。
(*SKIP)(*FAIL)
的完美示例:
“[^“”]+”(*SKIP)(*FAIL)|\.\s*
# looks for strings in double quotes
# throws them away
# matches a dot literally, followed by whitespaces eventually
在 PHP
:
$regex = '~“[^“”]+”(*SKIP)(*FAIL)|\.\s*~';
$parts = preg_split($regex, $your_string_here);
这会产生
Array
(
[0] => “Chess helps us overcome difficulties and sufferings,” said Unnikrishnan, taking my queen
[1] => “On a chess board you are fighting. as we are also fighting the hardships in our daily life.”
)
参见 a demo on regex101.com as well as a demo on ideone.com。
这是一个更简单的模式,preg_split()
后面跟着 preg_replace()
来固定左右双引号 (Demo):
$in = '“Chess helps us overcome difficulties and sufferings,” said Unnikrishnan, taking my queen. “On a chess board you are fighting. as we are also fighting the hardships in our daily life.” he said.';
$out = preg_split('/ (?=“)/', $in, 0, PREG_SPLIT_NO_EMPTY);
//$out = preg_match_all('/“.+?(?= “|$)/', $in, $out) ? $out[0] : null;
$find = '/[“”]/u'; // unicode flag is essential
$replace = '"';
$out = preg_replace($find, $replace, $out); // replace curly quotes with standard double quotes
var_export($out);
输出:
array (
0 => '"Chess helps us overcome difficulties and sufferings," said Unnikrishnan, taking my queen.',
1 => '"On a chess board you are fighting. as we are also fighting the hardships in our daily life." he said.',
)
preg_split()
匹配 space 后跟 “
(左双引号)。
preg_replace()
步骤需要一个带有 u
修饰符的模式,以确保识别字符 class 中的左右双引号。使用 '/“|”/'
意味着您可以删除 u
修饰符,但它会使正则表达式引擎必须执行的步骤加倍(对于这种情况,我的角色 class 与管道字符相比仅使用 189 个步骤使用 372 个步骤)。
此外,关于 preg_split()
和 preg_match_all()
之间的选择,选择 preg_split()
的原因是因为 objective 只是拆分 [= 上的字符串45=] 后跟 left double quote
。 preg_match_all()
如果 objective 是为了省略不与定界 space 字符相邻的子字符串,那么 preg_match_all()
将是一个更实用的选择。
不管我的逻辑如何,如果你仍然想使用 preg_match_all()
,我的 preg_split()
行可以替换为:
$out = preg_match_all('/“.+?(?= “|$)/', $in, $out) ? $out[0] : null;
或者:
regex101 ( 16步 )
“.[^”]+”(?:.[^“]+)?
“.[^”]+”
匹配 “
和 ”
. 之间的所有内容
(?:.[^“]+)?
匹配 - 一种可能性,这就是为什么最后一个 ?
- 一切都不是起始 “
, ?:
表示非捕获组。
PHP - PHPfiddle: - 点击 "Run-F9" - [已更新为替换 “
, ”
with "
]
<?php
$str = '“Chess helps us overcome difficulties and sufferings,” said Unnikrishnan, taking my queen. “On a chess board you are fighting. as we are also fighting the hardships in our daily life.”';
if(preg_match_all('/“.[^”]+”(?:.[^“]+)?/',$str, $matches)){
echo '<pre>';
print_r(preg_replace('[“|”]', '"', $matches[0]));
echo '</pre>';
}
?>
输出:
Array
(
[0] => "Chess helps us overcome difficulties and sufferings," said Unnikrishnan, taking my queen.
[1] => "On a chess board you are fighting. as we are also fighting the hardships in our daily life."
)
如何获取双引号中有一个点必须分开的句子?
示例文档如下:
“Chess helps us overcome difficulties and sufferings,” said Unnikrishnan, taking my queen. “On a chess board you are fighting. as we are also fighting the hardships in our daily life.” he said.
我想得到这样的输出:
Array
(
[0] =>"Chess helps us overcome difficulties and sufferings," said Unnikrishnan, taking my queen.
[1] =>"On a chess board you are fighting. as we are also fighting the hardships in our daily life," he said.
)
我的代码还是点点爆炸。
function sample($string)
{
$data=array();
$break=explode(".", $string);
array_push($data, $break);
print_r($data);
}
我仍然对拆分关于双引号和点的两个定界符感到困惑。因为在双引号里面有一个句子包含点分隔符。
(*SKIP)(*FAIL)
的完美示例:
“[^“”]+”(*SKIP)(*FAIL)|\.\s*
# looks for strings in double quotes
# throws them away
# matches a dot literally, followed by whitespaces eventually
在
PHP
:
$regex = '~“[^“”]+”(*SKIP)(*FAIL)|\.\s*~';
$parts = preg_split($regex, $your_string_here);
这会产生
Array
(
[0] => “Chess helps us overcome difficulties and sufferings,” said Unnikrishnan, taking my queen
[1] => “On a chess board you are fighting. as we are also fighting the hardships in our daily life.”
)
参见 a demo on regex101.com as well as a demo on ideone.com。
这是一个更简单的模式,preg_split()
后面跟着 preg_replace()
来固定左右双引号 (Demo):
$in = '“Chess helps us overcome difficulties and sufferings,” said Unnikrishnan, taking my queen. “On a chess board you are fighting. as we are also fighting the hardships in our daily life.” he said.';
$out = preg_split('/ (?=“)/', $in, 0, PREG_SPLIT_NO_EMPTY);
//$out = preg_match_all('/“.+?(?= “|$)/', $in, $out) ? $out[0] : null;
$find = '/[“”]/u'; // unicode flag is essential
$replace = '"';
$out = preg_replace($find, $replace, $out); // replace curly quotes with standard double quotes
var_export($out);
输出:
array (
0 => '"Chess helps us overcome difficulties and sufferings," said Unnikrishnan, taking my queen.',
1 => '"On a chess board you are fighting. as we are also fighting the hardships in our daily life." he said.',
)
preg_split()
匹配 space 后跟 “
(左双引号)。
preg_replace()
步骤需要一个带有 u
修饰符的模式,以确保识别字符 class 中的左右双引号。使用 '/“|”/'
意味着您可以删除 u
修饰符,但它会使正则表达式引擎必须执行的步骤加倍(对于这种情况,我的角色 class 与管道字符相比仅使用 189 个步骤使用 372 个步骤)。
此外,关于 preg_split()
和 preg_match_all()
之间的选择,选择 preg_split()
的原因是因为 objective 只是拆分 [= 上的字符串45=] 后跟 left double quote
。 preg_match_all()
如果 objective 是为了省略不与定界 space 字符相邻的子字符串,那么 preg_match_all()
将是一个更实用的选择。
不管我的逻辑如何,如果你仍然想使用 preg_match_all()
,我的 preg_split()
行可以替换为:
$out = preg_match_all('/“.+?(?= “|$)/', $in, $out) ? $out[0] : null;
或者:
regex101 ( 16步 )
“.[^”]+”(?:.[^“]+)?
“.[^”]+”
匹配“
和”
. 之间的所有内容
(?:.[^“]+)?
匹配 - 一种可能性,这就是为什么最后一个?
- 一切都不是起始“
,?:
表示非捕获组。
PHP - PHPfiddle: - 点击 "Run-F9" - [已更新为替换 “
, ”
with "
]
<?php
$str = '“Chess helps us overcome difficulties and sufferings,” said Unnikrishnan, taking my queen. “On a chess board you are fighting. as we are also fighting the hardships in our daily life.”';
if(preg_match_all('/“.[^”]+”(?:.[^“]+)?/',$str, $matches)){
echo '<pre>';
print_r(preg_replace('[“|”]', '"', $matches[0]));
echo '</pre>';
}
?>
输出:
Array ( [0] => "Chess helps us overcome difficulties and sufferings," said Unnikrishnan, taking my queen. [1] => "On a chess board you are fighting. as we are also fighting the hardships in our daily life." )