如何拆分文本以匹配双引号加上尾随文本以点?

How to split text to match double quotes plus trailing text to dot?

如何获取双引号中有一个点必须分开的句子?

示例文档如下:

“Chess helps us overcome difficulties and sufferings,” said Unnikrishnan, taking my queen. “On a chess board you are fighting. as we are also fighting the hardships in our daily life.” he said.

我想得到这样的输出:

Array
(
    [0] =>"Chess helps us overcome difficulties and sufferings," said Unnikrishnan, taking my queen.
    [1] =>"On a chess board you are fighting. as we are also fighting the hardships in our daily life," he said.
 )

我的代码还是点点爆炸。

function sample($string)
{
    $data=array();
    $break=explode(".", $string);
    array_push($data, $break);

    print_r($data);
}

我仍然对拆分关于双引号和点的两个定界符感到困惑。因为在双引号里面有一个句子包含点分隔符。

(*SKIP)(*FAIL)的完美示例:

“[^“”]+”(*SKIP)(*FAIL)|\.\s*
# looks for strings in double quotes
# throws them away
# matches a dot literally, followed by whitespaces eventually


PHP:

$regex = '~“[^“”]+”(*SKIP)(*FAIL)|\.\s*~';
$parts = preg_split($regex, $your_string_here);

这会产生

Array
(
    [0] => “Chess helps us overcome difficulties and sufferings,” said Unnikrishnan, taking my queen
    [1] => “On a chess board you are fighting. as we are also fighting the hardships in our daily life.”
)

参见 a demo on regex101.com as well as a demo on ideone.com

这是一个更简单的模式,preg_split() 后面跟着 preg_replace() 来固定左右双引号 (Demo):

$in = '“Chess helps us overcome difficulties and sufferings,” said Unnikrishnan, taking my queen. “On a chess board you are fighting. as we are also fighting the hardships in our daily life.” he said.';

$out = preg_split('/ (?=“)/', $in, 0, PREG_SPLIT_NO_EMPTY);
//$out = preg_match_all('/“.+?(?= “|$)/', $in, $out) ? $out[0] : null;

$find = '/[“”]/u';  // unicode flag is essential
$replace = '"';
$out = preg_replace($find, $replace, $out);  // replace curly quotes with standard double quotes

var_export($out);

输出:

array (
  0 => '"Chess helps us overcome difficulties and sufferings," said Unnikrishnan, taking my queen.',
  1 => '"On a chess board you are fighting. as we are also fighting the hardships in our daily life." he said.',
)

preg_split() 匹配 space 后跟 (左双引号)。

preg_replace() 步骤需要一个带有 u 修饰符的模式,以确保识别字符 class 中的左右双引号。使用 '/“|”/' 意味着您可以删除 u 修饰符,但它会使正则表达式引擎必须执行的步骤加倍(对于这种情况,我的角色 class 与管道字符相比仅使用 189 个步骤使用 372 个步骤)。

此外,关于 preg_split()preg_match_all() 之间的选择,选择 preg_split() 的原因是因为 objective 只是拆分 [= 上的字符串45=] 后跟 left double quotepreg_match_all() 如果 objective 是为了省略不与定界 space 字符相邻的子字符串,那么 preg_match_all() 将是一个更实用的选择。

不管我的逻辑如何,如果你仍然想使用 preg_match_all(),我的 preg_split() 行可以替换为:

$out = preg_match_all('/“.+?(?= “|$)/', $in, $out) ? $out[0] : null;

或者:

regex101 ( 16步 )

“.[^”]+”(?:.[^“]+)?

  • “.[^”]+” 匹配 .
  • 之间的所有内容
  • (?:.[^“]+)? 匹配 - 一种可能性,这就是为什么最后一个 ?- 一切都不是起始 , ?: 表示非捕获组。

PHP - PHPfiddle: - 点击 "Run-F9" - [已更新为替换 , with " ]

<?php
    $str = '“Chess helps us overcome difficulties and sufferings,” said Unnikrishnan, taking my queen. “On a chess board you are fighting. as we are also fighting the hardships in our daily life.”';

if(preg_match_all('/“.[^”]+”(?:.[^“]+)?/',$str, $matches)){
    echo '<pre>';
    print_r(preg_replace('[“|”]', '"', $matches[0]));
    echo '</pre>';
}
?>

输出:

Array
(
    [0] => "Chess helps us overcome difficulties and sufferings," said Unnikrishnan, taking my queen. 
    [1] => "On a chess board you are fighting. as we are also fighting the hardships in our daily life."
)