使用计数合并重复数组

Merging duplicate arrays with count

我正在处理大量数据,当数组重复时我需要合并它们。如果它们被合并,我需要在数组中添加一个计数。

array:3721 [▼
  0 => array:3 [▼
    "subscriber" => "gmail.com."
    "code" => 554
    "status" => 50
  ]
  1 => array:3 [▼
    "subscriber" => "apied.be"
    "code" => 550
    "status" => 51
  ]
  2 => array:3 [▼
    "subscriber" => "beton-dobbelaere.be"
    "code" => 550
    "status" => 50
  ]
  3 => array:3 [▼
    "subscriber" => "live.be"
    "code" => 550
    "status" => 51
  ]
  4 => array:3 [▼
    "subscriber" => "hotmail.be"
    "code" => 550
    "status" => 51
  ]
  5 => array:3 [▼
    "subscriber" => "telenet.be"
    "code" => 550
    "status" => 50
  ]
  6 => array:3 [▼
    "subscriber" => "telenet.be"
    "code" => 550
    "status" => 55
  ]
  7 => array:3 [▼
    "subscriber" => "telenet.be"
    "code" => 550
    "status" => 51
  ]
  8 => array:3 [▼
    "subscriber" => "telenet.be"
    "code" => 550
    "status" => 51
  ]

这应该类似于:

array:3721 [▼
  0 => array:3 [▼
    "subscriber" => "gmail.com."
    "code" => 554
    "status" => 50
    "amount" => 1
  ]
  1 => array:3 [▼
    "subscriber" => "apied.be"
    "code" => 550
    "status" => 51
    "amount" => 1
  ]
  2 => array:3 [▼
    "subscriber" => "beton-dobbelaere.be"
    "code" => 550
    "status" => 50
    "amount" => 1
  ]
  3 => array:3 [▼
    "subscriber" => "live.be"
    "code" => 550
    "status" => 51
    "amount" => 1
  ]
  4 => array:3 [▼
    "subscriber" => "hotmail.be"
    "code" => 550
    "status" => 51
    "amount" => 1
  ]
  5 => array:3 [▼
    "subscriber" => "telenet.be"
    "code" => 550
    "status" => 50
    "amount" => 1
  ]
  6 => array:3 [▼
    "subscriber" => "telenet.be"
    "code" => 550
    "status" => 55
    "amount" => 1
  ]
  7 => array:3 [▼
    "subscriber" => "telenet.be"
    "code" => 550
    "status" => 51
    "amount" => 2
  ]

当我使用

合并这个例子时
array_unique($hardbounces, SORT_REGULAR);

我得到了大约 534 个结果,而不是 3721 个,这很好,只是我还需要知道数量,而且它必须有点高效,因为结果集可能非常大(大得多)。

之后还需要对域和金额进行排序。

我正在使用 laravel 5.1 如有必要,我可以将数组转换为集合,以便可以使用辅助函数

我自己用 foreach 循环修复了它,它运行得比我预期的要快(0.3179 秒)

   $merged = [];
    foreach($hardbounces as &$hardbounce){
        if(empty($merged)){
            $merged[] = $hardbounce;
        }else{
            $i = 0;
            foreach($merged as $key => $merge){
                $i++;
                if($hardbounce['subscriber'] == $merge['subscriber'] && $hardbounce['code'] == $merge['code'] && $hardbounce['status'] == $merge['status']){
                    $merged[$key]['amount']++;
                    break;
                }
                if(count($merged) == $i){
                    $merged[] = $hardbounce;
                }
            }
        }
    }

我循环遍历反弹,如果合并为空(第一次迭代),它只会添加硬反弹。 从那时起,我将遍历新数组并检查是否有重复项。发生这种情况时,我们只需将数量加一并跳出 foreach。否则它最终仍会添加重复项。 在我检查我们是否进行到最后一次迭代之后,这意味着如果它仍然没有中断,我们应该添加 hardbounce,因为它还不存在。

为了清楚起见,我确保在这个循环运行之前数量 1 已经存在,而不是在这个循环期间添加它。

我不确定这段代码对很多项目的效率如何(顺便说一句,你没有提到数字的数量级),但我认为它比将数组转换为 [=14 更有效=] collections

$arr = [
      0 =>  [
        "subscriber" => "gmail.com.",
        "code" => 554,
        "status" => 50,
      ],
      1 =>  [
        "subscriber" => "apied.be",
        "code" => 550,
        "status" => 51,
      ],
      2 =>  [
        "subscriber" => "beton-dobbelaere.be",
        "code" => 550,
        "status" => 50,
      ],
      3 =>  [
        "subscriber" => "live.be",
        "code" => 550,
        "status" => 51,
      ],
      4 =>  [
        "subscriber" => "hotmail.be",
        "code" => 550,
        "status" => 51,
      ],
      5 =>  [
        "subscriber" => "telenet.be",
        "code" => 550,
        "status" => 50,
      ],
      6 =>  [
        "subscriber" => "telenet.be",
        "code" => 550,
        "status" => 55,
      ],
      7 =>  [
        "subscriber" => "telenet.be",
        "code" => 550,
        "status" => 51,
      ],
      8 =>  [
        "subscriber" => "telenet.be",
        "code" => 550,
        "status" => 51,
      ],
];

$res = [];
foreach($arr as $element) {
    if(empty($res[$element['subscriber']])) {
        $res[$element['subscriber']] = [$element, 'count' => 1];
    } else {
        $res[$element['subscriber']]['count']++;
    }
}

var_dump($res);

尝试

<?php
    $input = array(
      0 => array(
        "subscriber" => "gmail.com.",
        "code" => 554,
        "status" => 50),
      1 => array(
        "subscriber" => "apied.be",
        "code" => 550,
        "status" => 51),
      2 => array(
        "subscriber" => "beton-dobbelaere.be",
        "code" => 550,
        "status" => 50),
      3 => array(
        "subscriber" => "live.be",
        "code" => 550,
        "status" => 51),
      4 => array(
        "subscriber" => "hotmail.be",
        "code" => 550,
        "status" => 51),
      5 => array(
        "subscriber" => "telenet.be",
        "code" => 550,
        "status" => 50),
      6 => array(
        "subscriber" => "telenet.be",
        "code" => 550,
        "status" => 55),
      7 => array(
        "subscriber" => "telenet.be",
        "code" => 550,
        "status" => 51),
      8 => array(
        "subscriber" => "telenet.be",
        "code" => 550,
        "status" => 51)
    );


    /**
     *@param array $counted The array already counted or NULL
     *@param array $new The array to count or to merge with the counted $counted
     */
    function merge_xor_count(array $counted = NULL, array $new){
        if($counted === NULL){
            $counted = array();
        }
        foreach($new as $keyNew => $valueNew){
            $matches = false;
            foreach($counted as $keyOut => $valueOut){
                if ($valueOut['subscriber'] == $valueNew['subscriber'] && $valueOut['code'] == $valueNew['code'] &&
                    $valueOut['status'] == $valueNew['status']){
                    $matches = $keyOut;
                }
            }
            if($matches !== false){
                $counted[$matches]['amount']++;
            }
            else{
                if(!isset($valueNew['amount'])) $valueNew['amount'] = 1;
                $counted[] = $valueNew;
            }
        }
        return $counted;
    }

    $output = merge_xor_count(NULL, $input);

    print_r ($output)."\n";

    $output = merge_xor_count($output, $input);

    print_r ($output)."\n";


    ?>