如何检测和删除字符串中的重复句子?

How to detect and remove duplicate sentences in a string?

刚在另一个问题上得到了大家的帮助,我想知道我的下一个问题是否也可以轻松解决。

基本上,由于我无法将 pdf 转换为 excel 文件,所以我在每个单元格中都有很多重复的句子。

例如:

$bad_string = "B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >";

$good_string = goodFunction($bad_String);
//echo 'B7R, B9R, B12R, B12M 430mm Disc 2005 >'

这怎么可能? 条件是错误字符串重复 X 次。它永远不会改变,就像多次复制和粘贴一样(由于 pdf 到 exel 的转换不好)

有解决办法吗?

我会使用 preg_replace。我假设重复的字符串是连续的。

$bad_string = "B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >";
echo preg_replace('~^(.*?)+$~', '', $bad_string);

输出:

B7R, B9R, B12R, B12M 430mm Disc 2005 >

DEMO

如果句子必须以 > 符号结尾,则可以使用此正则表达式。

(.*?>)(?=(?:.*?)+$)

DEMO

$bad_string = "foo B7R, B9R, B12R, B12M 430mm Disc 2005 > bar B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >";
echo preg_replace('~(.*?>)(?=(?:.*?)+$)~', '', $bad_string);

输出:

foo  bar B7R, B9R, B12R, B12M 430mm Disc 2005 >