删除 PHP 中区间数组中的封闭区间

Question

我有这样一个区间数组，按下界排序（$a[$i] <= $a[$i+1] 对应每个 $i），key l 是下限，键 h 是上限，我想删除所有间隔被较大间隔包围的行。

$a[0] = array('l' => 123, 'h'=>241);
$a[1] = array('l' => 250, 'h'=>360);
$a[2] = array('l' => 280, 'h'=>285);
$a[3] = array('l' => 310, 'h'=>310);
$a[4] = array('l' => 390, 'h'=>400);

所以我想得到的结果是

$a[0] = array('l' => 123, 'h'=>241);
$a[1] = array('l' => 250, 'h'=>360);
$a[2] = array('l' => 390, 'h'=>400);

这就是我尝试的

function dup($a){
  $c = count($a)-1;
  for ($i = $c; $i > 0; $i --){
    while ($a[$i]['h'] <= $a[$i-1]['h']){
        unset($a[$i]);
    }
  }
$a = array_values($a);
}

Answer 1

一种简单的方法，也许不是您想要的，但至少应该为您指明正确的方向。如果需要我可以完善它，只是有点忙，不想留下未回答的问题..

$out = [];

foreach ($a as $valA)
{
  $found = false;

  foreach ($a as $valB)
  {
    if (($valA['l'] > $valB['l']) && ($valA['h'] < $valB['h']))
    {
      $found = true;
      break;
    }
  }

  if (!$found)
  {
    $out[] = $valA;
  }
}

这完全未经测试，但最终应该只有 $out 中唯一的（大）范围。我在评论中提到的重叠未处理。

Answer 2

问题是 while 循环中缺少 break

function dup($a){
  $c = count($a)-1;
  for ($i = $c; $i > 0; $i --){
     while ($a[$i]['h'] <= $a[$i-1]['h']){
      unset($a[$i]);
      break; //here
     }
  }
  $a = array_values($a);
}

Answer 3

这是代码

function sort_by_low($item1,$item2){
    if($item1['l'] == $item2['l'])
        return 0;
    return ($item1['l']>$item2['l'])? -1:1;
}

usort($a,'sort_by_low');

for($i=0; $i<count($a); $i++){
  for($j=$i+1; $j<count($a);$j++){
     if($a[$i][l]<=$a[$j]['l'] && $a[$i][h]>=$a[$j]['h']){
        unset($a[$j]);
     }
  }
}

$a=array_values($a);

Answer 4

这是工作代码（已测试）

$result = array();
usort($a, function ($item1, $item2) {
    if ($item1['l'] == $item2['l']) return 0;
    return $item1['l'] < $item2['l'] ? -1 : 1;
});

foreach ($a as $element) {
    $exists = false;
    foreach ($result as $r) {
        if (($r['l'] < $element['l'] && $r['h'] > $element['h'])) {
            $exists = true;
            break;
        }
    }
    if (!$exists) {
        $result[] = $element;
    }
}

$result 将包含所需的结果

Answer 5

想到的第一个答案是由其他贡献者给出的不同变体：对于每个间隔，在每个间隔上循环寻找更大的封闭间隔。它易于理解和编写，而且确实有效。

这基本上是 n2 顺序，这意味着对于 n 个间隔，我们将进行 n*n 次循环。可以有一些技巧来优化它：

当我们在嵌套循环中找到一个封闭区间时中断，如 user3137702 的回答，因为如果我们找到至少一个封闭区间，继续下去是没有用的
避免在嵌套循环中循环相同的区间，因为我们知道区间不能严格包含在自身中（不重要）
避免在嵌套循环中对已排除的间隔进行循环（可能会产生重大影响）
按width = (h - l)升序循环区间（全局循环），因为越小的区间越有可能被其他区间包围，我们越早消除区间，下一个循环越有效（在我看来也很重要）
按宽度降序搜索封闭区间（嵌套循环），因为较大的区间更有可能包含其他区间（我认为它也会产生重大影响）
可能还有很多暂时没有想到的事情

现在让我说:

如果我们时不时要计算几个间隔，那么优化并不重要，而当前接受的 user3137702 的答案可以解决问题
为了开发合适的算法，无论如何有必要研究我们必须处理的数据的特征：在我们面前的情况下，区间分布如何？是否有很多封闭区间？这可以帮助从上面的列表中选择最有用的技巧。

出于教育目的，我想知道我们是否可以开发一种不同的算法来避免 n*n 顺序，随着您增加要计算的间隔数，运行宁时间必然会很快逐渐恶化。

"Virtual rule"算法

我想象这个算法我称之为"virtual rule"。

在虚拟规则上放置区间的起点和终点
运行沿规则升序通过点
在运行期间，注册开放与否
当一个间隔开始和结束而另一个间隔之前打开并且仍然打开时，我们可以说它是封闭的
所以当一个间隔结束时，检查它是否在其他当前打开的间隔之一之后打开，以及它是否在此间隔之前严格关闭。如果是，则附上！

我不认为这是最好的解决方案。但我们可以假设这比基本方法更快，因为尽管在循环期间要进行许多测试，但这是 n 顺序。

代码示例

我写了评论以使其尽可能清楚。

<?php
    function removeEnclosedIntervals_VirtualRule($a, $debug = false)
    {
        $rule = array();

        // place one point on a virtual rule for each low or up bound, refering to the interval's index in $a
        // virtual rule has 2 levels because there can be more than one point for a value
        foreach($a as $i => $interval)
        {
            $rule[$interval['l']][] = array('l', $i);
            $rule[$interval['h']][] = array('h', $i);
        }

        // used in the foreach loop
        $open = array();
        $enclosed = array();

        // loop through the points on the ordered virtual rule
        ksort($rule);
        foreach($rule as $points)
        {
            // Will register open intervals
            // When an interval starts and ends while another was opened before and is still open, it is enclosed

            // starts
            foreach($points as $point)
                if($point[0] == 'l')
                    $open[$point[1]] = $point[1]; // register it as open

            // ends
            foreach($points as $point)
            {
                if($point[0] == 'h')
                {
                    unset($open[$point[1]]); // UNregister it as open

                    // was it opened after a still open interval ?
                    foreach($open as $i)
                    {
                        if($a[$i]['l'] < $a[$point[1]]['l'])
                        {
                            // it is enclosed.

                            // is it *strictly* enclosed ?
                            if($a[$i]['h'] > $a[$point[1]]['h'])
                            {
                                // so this interval is strictly enclosed
                                $enclosed[$point[1]] = $point[1];

                                if($debug)
                                    echo debugPhrase(
                                        $point[1], //           $iEnclosed
                                        $a[$point[1]]['l'], //  $lEnclosed
                                        $a[$point[1]]['h'], //  $hEnclosed

                                        $i, //                  $iLarger
                                        $a[$i]['l'], //         $lLarger
                                        $a[$i]['h'] //          $hLarger
                                    );

                                break;
                            }
                        }
                    }
                }
            }
        }

        // obviously
        foreach($enclosed as $i)
            unset($a[$i]);

        return $a;
    }
?>

以基本方法为基准

它运行s 测试运行domly 生成的间隔

基本方法毫无疑问有效。比较这两种方法的结果可以让我预防 "VirtualRule" 方法有效，因为据我测试，它返回相同的结果

// * include removeEnclosingIntervals_VirtualRule function *

// arbitrary range for intervals start and end
// Note that it could be interesting to do benchmarking with different MIN and MAX values !
define('MIN', 0);
define('MAX', 500);

// Benchmarking params
define('TEST_MAX_NUMBER', 100000);
define('TEST_BY_STEPS_OF', 100);

// from http://php.net/manual/en/function.microtime.php
// used later for benchmarking purpose
function microtime_float()
{
    list($usec, $sec) = explode(" ", microtime());
    return ((float)$usec + (float)$sec);
}

function debugPhrase($iEnclosed, $lEnclosed, $hEnclosed, $iLarger, $lLarger, $hLarger)
{
    return '('.$iEnclosed.')['.$lEnclosed.' ; '.$hEnclosed.'] is strictly enclosed at least in ('.$iLarger.')['.$lLarger.' ; '.$hLarger.']'.PHP_EOL;
}

// 2 foreach loops solution (based on user3137702's *damn good* work ;) and currently accepted answer)
function removeEnclosedIntervals_Basic($a, $debug = false)
{
    foreach ($a as $i => $valA)
    {
        $found = false;

        foreach ($a as $j => $valB)
        {
            if (($valA['l'] > $valB['l']) && ($valA['h'] < $valB['h']))
            {
                $found = true;

                if($debug)
                    echo debugPhrase(
                        $i, //                  $iEnclosed
                        $a[$i]['l'], //         $lEnclosed
                        $a[$i]['h'], //         $hEnclosed

                        $j, //                  $iLarger
                        $a[$j]['l'], //         $lLarger
                        $a[$j]['h'] //          $hLarger
                    );

                break;
            }
        }

        if (!$found)
        {
            $out[$i] = $valA;
        }
    }

    return $out;
}

// runs a benchmark with $number intervals
function runTest($number)
{
    // Generating a random set of intervals with values between MIN and MAX
    $randomSet = array();
    for($i=0; $i<$number; $i++)
        // avoiding self-closing intervals
        $randomSet[] = array(
            'l' => ($l = mt_rand(MIN, MAX-2)),
            'h' => mt_rand($l+1, MAX)
        );

    /* running the two methods and comparing results and execution time */

    // Basic method
    $start = microtime_float();
    $Basic_result = removeEnclosedIntervals_Basic($randomSet);
    $end = microtime_float();
    $Basic_time = $end - $start;

    // VirtualRule
    $start = microtime_float();
    $VirtualRule_result = removeEnclosedIntervals_VirtualRule($randomSet);
    $end = microtime_float();
    $VirtualRule_time = $end - $start;

    // Basic method works for sure.
    // If results are the same, comparing execution time. If not, sh*t happened !
    if(md5(var_export($VirtualRule_result, true)) == md5(var_export($VirtualRule_result, true)))
        echo $number.';'.$Basic_time.';'.$VirtualRule_time.PHP_EOL;
    else
    {
        echo '/;/;/;Work harder, results are not the same ! Cant say anything !'.PHP_EOL;
        stop;
    }
}

// CSV header
echo 'Number of intervals;Basic method exec time (s);VirtualRule method exec time (s)'.PHP_EOL;

for($n=TEST_BY_STEPS_OF; $n<TEST_MAX_NUMBER; $n+=TEST_BY_STEPS_OF)
{
    runTest($n);
    flush();
}

结果（对我来说）

正如我所想，获得了明显不同的性能。
我运行在 PHP5 的酷睿 i7 计算机和 PHP7 的（旧）AMD 四核计算机上进行了测试。在我的系统上，这两个版本之间的性能存在明显差异！原则上可以用 PHP 版本的差异来解释，因为运行ning PHP5 的计算机功能更强大...

删除 PHP 中区间数组中的封闭区间

Removing enclosed intervals in an array of intervals in PHP

php

arrays

intervals