如何从列表中获取最不同的字符串
how to get most differents string from a list
我有一个包含许多相似字符串的列表,例如:
$str = array('monkey eat a banana',
'dog eat a banana',
'cat devour an apple',
'cat dine a coco'); //etc
我想从此数组中提取 X 个彼此最不同的字符串。
示例:如果我想提取 3 个,它将是:'monkey eat a banana' and 'cat dine a coco' and 'cat devour an apple'.
我该如何实现?我找到了 similar_text() 函数,我想我可以使用它但是如何用 X 的任何值提取它们?
感谢您的建议
ps : 我用它来做 SEO,目的是避免尽可能多的重复内容。
用下面的示例代码测试,结论是:从similar_text()
中选择percentage
最低的字符串,它们是最不同的。
$str = array('monkey eat a banana',
'dog eat a banana',
'cat devour an apple',
'cat dine a coco');
$len = count($str);
echo '<table width="100%">';
for($i=0; $i<$len; $i++) {
for($j=0; $j<$len; $j++) {
if($i==$j) contiue;
$num = similar_text($str[$i], $str[$j], $percent );
echo '<tr><td>' . $str[$i] . '<td>' . $str[$j] . '<td>' . strlen($str[$i]) . '<td>' . strlen($str[$j]). '<td>' . $num. '<td>' . number_format($percent, 0);
}
}
echo '</table>';
结果如下:
string 1 string 2 percentage
monkey eat a banana monkey eat a banana 19 19 19 100
monkey eat a banana dog eat a banana 19 16 14 80
monkey eat a banana cat devour an apple 19 19 7 37
monkey eat a banana cat dine a coco 19 15 5 29
dog eat a banana monkey eat a banana 16 19 14 80
dog eat a banana dog eat a banana 16 16 16 100
dog eat a banana cat devour an apple 16 19 7 40
dog eat a banana cat dine a coco 16 15 5 32
cat devour an apple monkey eat a banana 19 19 7 37
cat devour an apple dog eat a banana 19 16 7 40
cat devour an apple cat devour an apple 19 19 19 100
cat devour an apple cat dine a coco 19 15 9 53
cat dine a coco monkey eat a banana 15 19 5 29
cat dine a coco dog eat a banana 15 16 5 32
cat dine a coco cat devour an apple 15 19 9 53
cat dine a coco cat dine a coco 15 15 15 100
$希望对您有所帮助
$str = array(
'cat devour an apple',
'dog eat a banana',
'monkey eat a banana',
'cat dine a coco',
); //etc
$overal_scores = [];
foreach ($str as $i => $s) {
$overal_scores[$i] = 0;
foreach ($str as $j => $d) {
if ($i != $j) {
$overal_scores[$i] += similar_text($s, $d);
}
}
}
asort($overal_scores);
$x = 3;
$results_index = array_slice(array_keys($overal_scores), 0, $x);
$result_str = [];
foreach ($results_index as $index) {
$result_str[] = $str[$index];
}
var_dump($result_str);
我有一个包含许多相似字符串的列表,例如:
$str = array('monkey eat a banana',
'dog eat a banana',
'cat devour an apple',
'cat dine a coco'); //etc
我想从此数组中提取 X 个彼此最不同的字符串。 示例:如果我想提取 3 个,它将是:'monkey eat a banana' and 'cat dine a coco' and 'cat devour an apple'.
我该如何实现?我找到了 similar_text() 函数,我想我可以使用它但是如何用 X 的任何值提取它们?
感谢您的建议
ps : 我用它来做 SEO,目的是避免尽可能多的重复内容。
用下面的示例代码测试,结论是:从similar_text()
中选择percentage
最低的字符串,它们是最不同的。
$str = array('monkey eat a banana',
'dog eat a banana',
'cat devour an apple',
'cat dine a coco');
$len = count($str);
echo '<table width="100%">';
for($i=0; $i<$len; $i++) {
for($j=0; $j<$len; $j++) {
if($i==$j) contiue;
$num = similar_text($str[$i], $str[$j], $percent );
echo '<tr><td>' . $str[$i] . '<td>' . $str[$j] . '<td>' . strlen($str[$i]) . '<td>' . strlen($str[$j]). '<td>' . $num. '<td>' . number_format($percent, 0);
}
}
echo '</table>';
结果如下:
string 1 string 2 percentage
monkey eat a banana monkey eat a banana 19 19 19 100
monkey eat a banana dog eat a banana 19 16 14 80
monkey eat a banana cat devour an apple 19 19 7 37
monkey eat a banana cat dine a coco 19 15 5 29
dog eat a banana monkey eat a banana 16 19 14 80
dog eat a banana dog eat a banana 16 16 16 100
dog eat a banana cat devour an apple 16 19 7 40
dog eat a banana cat dine a coco 16 15 5 32
cat devour an apple monkey eat a banana 19 19 7 37
cat devour an apple dog eat a banana 19 16 7 40
cat devour an apple cat devour an apple 19 19 19 100
cat devour an apple cat dine a coco 19 15 9 53
cat dine a coco monkey eat a banana 15 19 5 29
cat dine a coco dog eat a banana 15 16 5 32
cat dine a coco cat devour an apple 15 19 9 53
cat dine a coco cat dine a coco 15 15 15 100
$希望对您有所帮助
$str = array(
'cat devour an apple',
'dog eat a banana',
'monkey eat a banana',
'cat dine a coco',
); //etc
$overal_scores = [];
foreach ($str as $i => $s) {
$overal_scores[$i] = 0;
foreach ($str as $j => $d) {
if ($i != $j) {
$overal_scores[$i] += similar_text($s, $d);
}
}
}
asort($overal_scores);
$x = 3;
$results_index = array_slice(array_keys($overal_scores), 0, $x);
$result_str = [];
foreach ($results_index as $index) {
$result_str[] = $str[$index];
}
var_dump($result_str);