在数组中查找相似的字符串
Finding similar strings in array
我需要利用 similar_text()
的值数组,如下所示:
$strings = ["lawyer" => 3, "business" => 3, "lawyers" => 1, "a" => 3];
我想做的是在上面的数组中找到几乎相同的词,即 lawyer
和 lawyers
,然后将它们的计数加到一个新数组中.
因此 lawyer
将是 4
,因为 lawyers
将关联到 lawyer
的原始字符串。
请记住,这个数组只会是单数词,长度未指定,范围可以从 1
到 >99
。
我不知道从哪里开始,所以我用 foreach 循环解决了这个问题,如下所示,但预期的输出与预期不符。
foreach ( $strings as $key_one => $count_one ) {
foreach ( $strings as $key_two => $count_two ) {
similar_text($key_two, $key_one, $percent);
if ($percent > 80) {
if(!isset($counts[$key_one])) {
$counts[$key_one] = $count_one;
} else {
$counts[$key_one] += $count_two;
}
}
}
}
注意: 此示例的匹配百分比为 80
(作为 lawyer
的匹配& lawyers
是 ~92%
)
这最终给了我类似于以下内容的内容:
Array
(
[lawyer] => 4
[business] => 3
[a] => 3
[lawyers] => 2
)
我要求的位置:
Array
(
[lawyer] => 4
[business] => 3
[a] => 3
)
请注意我是如何要求它实际删除 lawyers
并将计数添加到 lawyer
.
您可以随时使用
unset( $counts[$key_two] ) ;
你的困难在于,正如律师与律师相似,律师与律师也相似。所以他们俩的计数都被对方提高了。
试试这个:
foreach ( $strings as $key_one => &$count_one ) {
if ($count_one == 0) continue; // skip it if we've already processed it
if (!isset($counts[$key_one]) {
$counts[$key_one] = $count_one;
$count_one = 0;
}
foreach ( $strings as $key_two => &$count_two ) {
similar_text($key_two, $key_one, $percent);
if ($percent > 80) {
$counts[$key_one] += $count_two;
$count_two = 0;
}
}
}
这样做的缺点是您更改了可能不理想的原始 $strings 数组。这是另一种方法,在另一个散列中跟踪已处理的字符串:
$already = $counts = array(); // not really necessary, but nice to init
foreach ( $strings as $key_one => $count_one ) {
if (isset($already[$key_one])) continue; // skip if already processed
$counts[$key_one] = $count_one; // by definition this should be new
foreach ( $strings as $key_two => $count_two ) {
similar_text($key_two, $key_one, $percent);
if ($percent > 80) {
$counts[$key_one] += $count_two;
$already[$key_two] = true;
}
}
}
我会推荐第二种解决方案。
我需要利用 similar_text()
的值数组,如下所示:
$strings = ["lawyer" => 3, "business" => 3, "lawyers" => 1, "a" => 3];
我想做的是在上面的数组中找到几乎相同的词,即 lawyer
和 lawyers
,然后将它们的计数加到一个新数组中.
因此 lawyer
将是 4
,因为 lawyers
将关联到 lawyer
的原始字符串。
请记住,这个数组只会是单数词,长度未指定,范围可以从 1
到 >99
。
我不知道从哪里开始,所以我用 foreach 循环解决了这个问题,如下所示,但预期的输出与预期不符。
foreach ( $strings as $key_one => $count_one ) {
foreach ( $strings as $key_two => $count_two ) {
similar_text($key_two, $key_one, $percent);
if ($percent > 80) {
if(!isset($counts[$key_one])) {
$counts[$key_one] = $count_one;
} else {
$counts[$key_one] += $count_two;
}
}
}
}
注意: 此示例的匹配百分比为 80
(作为 lawyer
的匹配& lawyers
是 ~92%
)
这最终给了我类似于以下内容的内容:
Array
(
[lawyer] => 4
[business] => 3
[a] => 3
[lawyers] => 2
)
我要求的位置:
Array
(
[lawyer] => 4
[business] => 3
[a] => 3
)
请注意我是如何要求它实际删除 lawyers
并将计数添加到 lawyer
.
您可以随时使用
unset( $counts[$key_two] ) ;
你的困难在于,正如律师与律师相似,律师与律师也相似。所以他们俩的计数都被对方提高了。
试试这个:
foreach ( $strings as $key_one => &$count_one ) {
if ($count_one == 0) continue; // skip it if we've already processed it
if (!isset($counts[$key_one]) {
$counts[$key_one] = $count_one;
$count_one = 0;
}
foreach ( $strings as $key_two => &$count_two ) {
similar_text($key_two, $key_one, $percent);
if ($percent > 80) {
$counts[$key_one] += $count_two;
$count_two = 0;
}
}
}
这样做的缺点是您更改了可能不理想的原始 $strings 数组。这是另一种方法,在另一个散列中跟踪已处理的字符串:
$already = $counts = array(); // not really necessary, but nice to init
foreach ( $strings as $key_one => $count_one ) {
if (isset($already[$key_one])) continue; // skip if already processed
$counts[$key_one] = $count_one; // by definition this should be new
foreach ( $strings as $key_two => $count_two ) {
similar_text($key_two, $key_one, $percent);
if ($percent > 80) {
$counts[$key_one] += $count_two;
$already[$key_two] = true;
}
}
}
我会推荐第二种解决方案。