如何拆分文本以匹配双引号加上尾随文本以点？

Question

如何获取双引号中有一个点必须分开的句子？

示例文档如下：

“Chess helps us overcome difficulties and sufferings,” said Unnikrishnan, taking my queen. “On a chess board you are fighting. as we are also fighting the hardships in our daily life.” he said.

我想得到这样的输出：

Array
(
    [0] =>"Chess helps us overcome difficulties and sufferings," said Unnikrishnan, taking my queen.
    [1] =>"On a chess board you are fighting. as we are also fighting the hardships in our daily life," he said.
 )

我的代码还是点点爆炸。

function sample($string)
{
    $data=array();
    $break=explode(".", $string);
    array_push($data, $break);

    print_r($data);
}

我仍然对拆分关于双引号和点的两个定界符感到困惑。因为在双引号里面有一个句子包含点分隔符。

Answer 1

(*SKIP)(*FAIL)的完美示例：

“[^“”]+”(*SKIP)(*FAIL)|\.\s*
# looks for strings in double quotes
# throws them away
# matches a dot literally, followed by whitespaces eventually

在 PHP:

$regex = '~“[^“”]+”(*SKIP)(*FAIL)|\.\s*~';
$parts = preg_split($regex, $your_string_here);

这会产生

Array
(
    [0] => “Chess helps us overcome difficulties and sufferings,” said Unnikrishnan, taking my queen
    [1] => “On a chess board you are fighting. as we are also fighting the hardships in our daily life.”
)

参见 a demo on regex101.com as well as a demo on ideone.com。

Answer 2

这是一个更简单的模式，preg_split() 后面跟着 preg_replace() 来固定左右双引号 (Demo):

$in = '“Chess helps us overcome difficulties and sufferings,” said Unnikrishnan, taking my queen. “On a chess board you are fighting. as we are also fighting the hardships in our daily life.” he said.';

$out = preg_split('/ (?=“)/', $in, 0, PREG_SPLIT_NO_EMPTY);
//$out = preg_match_all('/“.+?(?= “|$)/', $in, $out) ? $out[0] : null;

$find = '/[“”]/u';  // unicode flag is essential
$replace = '"';
$out = preg_replace($find, $replace, $out);  // replace curly quotes with standard double quotes

var_export($out);

输出：

array (
  0 => '"Chess helps us overcome difficulties and sufferings," said Unnikrishnan, taking my queen.',
  1 => '"On a chess board you are fighting. as we are also fighting the hardships in our daily life." he said.',
)

preg_split() 匹配 space 后跟 “（左双引号）。

preg_replace() 步骤需要一个带有 u 修饰符的模式，以确保识别字符 class 中的左右双引号。使用 '/“|”/' 意味着您可以删除 u 修饰符，但它会使正则表达式引擎必须执行的步骤加倍（对于这种情况，我的角色 class 与管道字符相比仅使用 189 个步骤使用 372 个步骤）。

此外，关于 preg_split() 和 preg_match_all() 之间的选择，选择 preg_split() 的原因是因为 objective 只是拆分 [= 上的字符串45=] 后跟 left double quote。 preg_match_all() 如果 objective 是为了省略不与定界 space 字符相邻的子字符串，那么 preg_match_all() 将是一个更实用的选择。

不管我的逻辑如何，如果你仍然想使用 preg_match_all()，我的 preg_split() 行可以替换为：

$out = preg_match_all('/“.+?(?= “|$)/', $in, $out) ? $out[0] : null;

Answer 3

或者：

regex101 ^{( 16步 )}

“.[^”]+”(?:.[^“]+)?

“.[^”]+” 匹配 “ 和 ”.
(?:.[^“]+)? 匹配 - 一种可能性，这就是为什么最后一个 ?- 一切都不是起始 “, ?: 表示非捕获组。

PHP - PHPfiddle: - 点击 "Run-F9" - [已更新为替换 “, ” with " ]

<?php
    $str = '“Chess helps us overcome difficulties and sufferings,” said Unnikrishnan, taking my queen. “On a chess board you are fighting. as we are also fighting the hardships in our daily life.”';

if(preg_match_all('/“.[^”]+”(?:.[^“]+)?/',$str, $matches)){
    echo '<pre>';
    print_r(preg_replace('[“|”]', '"', $matches[0]));
    echo '</pre>';
}
?>

输出：

Array
(
    [0] => "Chess helps us overcome difficulties and sufferings," said Unnikrishnan, taking my queen. 
    [1] => "On a chess board you are fighting. as we are also fighting the hardships in our daily life."
)

如何拆分文本以匹配双引号加上尾随文本以点？

How to split text to match double quotes plus trailing text to dot?

php

regex

unicode

preg-split

如何获取双引号中有一个点必须分开的句子？