使用计数合并重复数组
Merging duplicate arrays with count
我正在处理大量数据,当数组重复时我需要合并它们。如果它们被合并,我需要在数组中添加一个计数。
array:3721 [▼
0 => array:3 [▼
"subscriber" => "gmail.com."
"code" => 554
"status" => 50
]
1 => array:3 [▼
"subscriber" => "apied.be"
"code" => 550
"status" => 51
]
2 => array:3 [▼
"subscriber" => "beton-dobbelaere.be"
"code" => 550
"status" => 50
]
3 => array:3 [▼
"subscriber" => "live.be"
"code" => 550
"status" => 51
]
4 => array:3 [▼
"subscriber" => "hotmail.be"
"code" => 550
"status" => 51
]
5 => array:3 [▼
"subscriber" => "telenet.be"
"code" => 550
"status" => 50
]
6 => array:3 [▼
"subscriber" => "telenet.be"
"code" => 550
"status" => 55
]
7 => array:3 [▼
"subscriber" => "telenet.be"
"code" => 550
"status" => 51
]
8 => array:3 [▼
"subscriber" => "telenet.be"
"code" => 550
"status" => 51
]
这应该类似于:
array:3721 [▼
0 => array:3 [▼
"subscriber" => "gmail.com."
"code" => 554
"status" => 50
"amount" => 1
]
1 => array:3 [▼
"subscriber" => "apied.be"
"code" => 550
"status" => 51
"amount" => 1
]
2 => array:3 [▼
"subscriber" => "beton-dobbelaere.be"
"code" => 550
"status" => 50
"amount" => 1
]
3 => array:3 [▼
"subscriber" => "live.be"
"code" => 550
"status" => 51
"amount" => 1
]
4 => array:3 [▼
"subscriber" => "hotmail.be"
"code" => 550
"status" => 51
"amount" => 1
]
5 => array:3 [▼
"subscriber" => "telenet.be"
"code" => 550
"status" => 50
"amount" => 1
]
6 => array:3 [▼
"subscriber" => "telenet.be"
"code" => 550
"status" => 55
"amount" => 1
]
7 => array:3 [▼
"subscriber" => "telenet.be"
"code" => 550
"status" => 51
"amount" => 2
]
当我使用
合并这个例子时
array_unique($hardbounces, SORT_REGULAR);
我得到了大约 534 个结果,而不是 3721 个,这很好,只是我还需要知道数量,而且它必须有点高效,因为结果集可能非常大(大得多)。
之后还需要对域和金额进行排序。
我正在使用 laravel 5.1 如有必要,我可以将数组转换为集合,以便可以使用辅助函数
我自己用 foreach 循环修复了它,它运行得比我预期的要快(0.3179 秒)
$merged = [];
foreach($hardbounces as &$hardbounce){
if(empty($merged)){
$merged[] = $hardbounce;
}else{
$i = 0;
foreach($merged as $key => $merge){
$i++;
if($hardbounce['subscriber'] == $merge['subscriber'] && $hardbounce['code'] == $merge['code'] && $hardbounce['status'] == $merge['status']){
$merged[$key]['amount']++;
break;
}
if(count($merged) == $i){
$merged[] = $hardbounce;
}
}
}
}
我循环遍历反弹,如果合并为空(第一次迭代),它只会添加硬反弹。
从那时起,我将遍历新数组并检查是否有重复项。发生这种情况时,我们只需将数量加一并跳出 foreach。否则它最终仍会添加重复项。
在我检查我们是否进行到最后一次迭代之后,这意味着如果它仍然没有中断,我们应该添加 hardbounce,因为它还不存在。
为了清楚起见,我确保在这个循环运行之前数量 1 已经存在,而不是在这个循环期间添加它。
我不确定这段代码对很多项目的效率如何(顺便说一句,你没有提到数字的数量级),但我认为它比将数组转换为 [=14 更有效=] collections
$arr = [
0 => [
"subscriber" => "gmail.com.",
"code" => 554,
"status" => 50,
],
1 => [
"subscriber" => "apied.be",
"code" => 550,
"status" => 51,
],
2 => [
"subscriber" => "beton-dobbelaere.be",
"code" => 550,
"status" => 50,
],
3 => [
"subscriber" => "live.be",
"code" => 550,
"status" => 51,
],
4 => [
"subscriber" => "hotmail.be",
"code" => 550,
"status" => 51,
],
5 => [
"subscriber" => "telenet.be",
"code" => 550,
"status" => 50,
],
6 => [
"subscriber" => "telenet.be",
"code" => 550,
"status" => 55,
],
7 => [
"subscriber" => "telenet.be",
"code" => 550,
"status" => 51,
],
8 => [
"subscriber" => "telenet.be",
"code" => 550,
"status" => 51,
],
];
$res = [];
foreach($arr as $element) {
if(empty($res[$element['subscriber']])) {
$res[$element['subscriber']] = [$element, 'count' => 1];
} else {
$res[$element['subscriber']]['count']++;
}
}
var_dump($res);
尝试
<?php
$input = array(
0 => array(
"subscriber" => "gmail.com.",
"code" => 554,
"status" => 50),
1 => array(
"subscriber" => "apied.be",
"code" => 550,
"status" => 51),
2 => array(
"subscriber" => "beton-dobbelaere.be",
"code" => 550,
"status" => 50),
3 => array(
"subscriber" => "live.be",
"code" => 550,
"status" => 51),
4 => array(
"subscriber" => "hotmail.be",
"code" => 550,
"status" => 51),
5 => array(
"subscriber" => "telenet.be",
"code" => 550,
"status" => 50),
6 => array(
"subscriber" => "telenet.be",
"code" => 550,
"status" => 55),
7 => array(
"subscriber" => "telenet.be",
"code" => 550,
"status" => 51),
8 => array(
"subscriber" => "telenet.be",
"code" => 550,
"status" => 51)
);
/**
*@param array $counted The array already counted or NULL
*@param array $new The array to count or to merge with the counted $counted
*/
function merge_xor_count(array $counted = NULL, array $new){
if($counted === NULL){
$counted = array();
}
foreach($new as $keyNew => $valueNew){
$matches = false;
foreach($counted as $keyOut => $valueOut){
if ($valueOut['subscriber'] == $valueNew['subscriber'] && $valueOut['code'] == $valueNew['code'] &&
$valueOut['status'] == $valueNew['status']){
$matches = $keyOut;
}
}
if($matches !== false){
$counted[$matches]['amount']++;
}
else{
if(!isset($valueNew['amount'])) $valueNew['amount'] = 1;
$counted[] = $valueNew;
}
}
return $counted;
}
$output = merge_xor_count(NULL, $input);
print_r ($output)."\n";
$output = merge_xor_count($output, $input);
print_r ($output)."\n";
?>
我正在处理大量数据,当数组重复时我需要合并它们。如果它们被合并,我需要在数组中添加一个计数。
array:3721 [▼
0 => array:3 [▼
"subscriber" => "gmail.com."
"code" => 554
"status" => 50
]
1 => array:3 [▼
"subscriber" => "apied.be"
"code" => 550
"status" => 51
]
2 => array:3 [▼
"subscriber" => "beton-dobbelaere.be"
"code" => 550
"status" => 50
]
3 => array:3 [▼
"subscriber" => "live.be"
"code" => 550
"status" => 51
]
4 => array:3 [▼
"subscriber" => "hotmail.be"
"code" => 550
"status" => 51
]
5 => array:3 [▼
"subscriber" => "telenet.be"
"code" => 550
"status" => 50
]
6 => array:3 [▼
"subscriber" => "telenet.be"
"code" => 550
"status" => 55
]
7 => array:3 [▼
"subscriber" => "telenet.be"
"code" => 550
"status" => 51
]
8 => array:3 [▼
"subscriber" => "telenet.be"
"code" => 550
"status" => 51
]
这应该类似于:
array:3721 [▼
0 => array:3 [▼
"subscriber" => "gmail.com."
"code" => 554
"status" => 50
"amount" => 1
]
1 => array:3 [▼
"subscriber" => "apied.be"
"code" => 550
"status" => 51
"amount" => 1
]
2 => array:3 [▼
"subscriber" => "beton-dobbelaere.be"
"code" => 550
"status" => 50
"amount" => 1
]
3 => array:3 [▼
"subscriber" => "live.be"
"code" => 550
"status" => 51
"amount" => 1
]
4 => array:3 [▼
"subscriber" => "hotmail.be"
"code" => 550
"status" => 51
"amount" => 1
]
5 => array:3 [▼
"subscriber" => "telenet.be"
"code" => 550
"status" => 50
"amount" => 1
]
6 => array:3 [▼
"subscriber" => "telenet.be"
"code" => 550
"status" => 55
"amount" => 1
]
7 => array:3 [▼
"subscriber" => "telenet.be"
"code" => 550
"status" => 51
"amount" => 2
]
当我使用
合并这个例子时array_unique($hardbounces, SORT_REGULAR);
我得到了大约 534 个结果,而不是 3721 个,这很好,只是我还需要知道数量,而且它必须有点高效,因为结果集可能非常大(大得多)。
之后还需要对域和金额进行排序。
我正在使用 laravel 5.1 如有必要,我可以将数组转换为集合,以便可以使用辅助函数
我自己用 foreach 循环修复了它,它运行得比我预期的要快(0.3179 秒)
$merged = [];
foreach($hardbounces as &$hardbounce){
if(empty($merged)){
$merged[] = $hardbounce;
}else{
$i = 0;
foreach($merged as $key => $merge){
$i++;
if($hardbounce['subscriber'] == $merge['subscriber'] && $hardbounce['code'] == $merge['code'] && $hardbounce['status'] == $merge['status']){
$merged[$key]['amount']++;
break;
}
if(count($merged) == $i){
$merged[] = $hardbounce;
}
}
}
}
我循环遍历反弹,如果合并为空(第一次迭代),它只会添加硬反弹。 从那时起,我将遍历新数组并检查是否有重复项。发生这种情况时,我们只需将数量加一并跳出 foreach。否则它最终仍会添加重复项。 在我检查我们是否进行到最后一次迭代之后,这意味着如果它仍然没有中断,我们应该添加 hardbounce,因为它还不存在。
为了清楚起见,我确保在这个循环运行之前数量 1 已经存在,而不是在这个循环期间添加它。
我不确定这段代码对很多项目的效率如何(顺便说一句,你没有提到数字的数量级),但我认为它比将数组转换为 [=14 更有效=] collections
$arr = [
0 => [
"subscriber" => "gmail.com.",
"code" => 554,
"status" => 50,
],
1 => [
"subscriber" => "apied.be",
"code" => 550,
"status" => 51,
],
2 => [
"subscriber" => "beton-dobbelaere.be",
"code" => 550,
"status" => 50,
],
3 => [
"subscriber" => "live.be",
"code" => 550,
"status" => 51,
],
4 => [
"subscriber" => "hotmail.be",
"code" => 550,
"status" => 51,
],
5 => [
"subscriber" => "telenet.be",
"code" => 550,
"status" => 50,
],
6 => [
"subscriber" => "telenet.be",
"code" => 550,
"status" => 55,
],
7 => [
"subscriber" => "telenet.be",
"code" => 550,
"status" => 51,
],
8 => [
"subscriber" => "telenet.be",
"code" => 550,
"status" => 51,
],
];
$res = [];
foreach($arr as $element) {
if(empty($res[$element['subscriber']])) {
$res[$element['subscriber']] = [$element, 'count' => 1];
} else {
$res[$element['subscriber']]['count']++;
}
}
var_dump($res);
尝试
<?php
$input = array(
0 => array(
"subscriber" => "gmail.com.",
"code" => 554,
"status" => 50),
1 => array(
"subscriber" => "apied.be",
"code" => 550,
"status" => 51),
2 => array(
"subscriber" => "beton-dobbelaere.be",
"code" => 550,
"status" => 50),
3 => array(
"subscriber" => "live.be",
"code" => 550,
"status" => 51),
4 => array(
"subscriber" => "hotmail.be",
"code" => 550,
"status" => 51),
5 => array(
"subscriber" => "telenet.be",
"code" => 550,
"status" => 50),
6 => array(
"subscriber" => "telenet.be",
"code" => 550,
"status" => 55),
7 => array(
"subscriber" => "telenet.be",
"code" => 550,
"status" => 51),
8 => array(
"subscriber" => "telenet.be",
"code" => 550,
"status" => 51)
);
/**
*@param array $counted The array already counted or NULL
*@param array $new The array to count or to merge with the counted $counted
*/
function merge_xor_count(array $counted = NULL, array $new){
if($counted === NULL){
$counted = array();
}
foreach($new as $keyNew => $valueNew){
$matches = false;
foreach($counted as $keyOut => $valueOut){
if ($valueOut['subscriber'] == $valueNew['subscriber'] && $valueOut['code'] == $valueNew['code'] &&
$valueOut['status'] == $valueNew['status']){
$matches = $keyOut;
}
}
if($matches !== false){
$counted[$matches]['amount']++;
}
else{
if(!isset($valueNew['amount'])) $valueNew['amount'] = 1;
$counted[] = $valueNew;
}
}
return $counted;
}
$output = merge_xor_count(NULL, $input);
print_r ($output)."\n";
$output = merge_xor_count($output, $input);
print_r ($output)."\n";
?>