如何检测和删除字符串中的重复句子?
How to detect and remove duplicate sentences in a string?
刚在另一个问题上得到了大家的帮助,我想知道我的下一个问题是否也可以轻松解决。
基本上,由于我无法将 pdf 转换为 excel 文件,所以我在每个单元格中都有很多重复的句子。
例如:
$bad_string = "B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >";
$good_string = goodFunction($bad_String);
//echo 'B7R, B9R, B12R, B12M 430mm Disc 2005 >'
这怎么可能?
条件是错误字符串重复 X 次。它永远不会改变,就像多次复制和粘贴一样(由于 pdf 到 exel 的转换不好)
有解决办法吗?
我会使用 preg_replace
。我假设重复的字符串是连续的。
$bad_string = "B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >";
echo preg_replace('~^(.*?)+$~', '', $bad_string);
输出:
B7R, B9R, B12R, B12M 430mm Disc 2005 >
如果句子必须以 >
符号结尾,则可以使用此正则表达式。
(.*?>)(?=(?:.*?)+$)
$bad_string = "foo B7R, B9R, B12R, B12M 430mm Disc 2005 > bar B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >";
echo preg_replace('~(.*?>)(?=(?:.*?)+$)~', '', $bad_string);
输出:
foo bar B7R, B9R, B12R, B12M 430mm Disc 2005 >
刚在另一个问题上得到了大家的帮助,我想知道我的下一个问题是否也可以轻松解决。
基本上,由于我无法将 pdf 转换为 excel 文件,所以我在每个单元格中都有很多重复的句子。
例如:
$bad_string = "B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >";
$good_string = goodFunction($bad_String);
//echo 'B7R, B9R, B12R, B12M 430mm Disc 2005 >'
这怎么可能? 条件是错误字符串重复 X 次。它永远不会改变,就像多次复制和粘贴一样(由于 pdf 到 exel 的转换不好)
有解决办法吗?
我会使用 preg_replace
。我假设重复的字符串是连续的。
$bad_string = "B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >";
echo preg_replace('~^(.*?)+$~', '', $bad_string);
输出:
B7R, B9R, B12R, B12M 430mm Disc 2005 >
如果句子必须以 >
符号结尾,则可以使用此正则表达式。
(.*?>)(?=(?:.*?)+$)
$bad_string = "foo B7R, B9R, B12R, B12M 430mm Disc 2005 > bar B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >";
echo preg_replace('~(.*?>)(?=(?:.*?)+$)~', '', $bad_string);
输出:
foo bar B7R, B9R, B12R, B12M 430mm Disc 2005 >