查找人口最多的年份(最有效的解决方案)
Find the year with the highest population (most efficient solution)
给定两个数组; $births
包含表明某人出生时间的出生年份列表,$deaths
包含表明某人死亡时间的死亡年份列表,我们如何找到人口最多的年份?
例如给定以下数组:
$births = [1984, 1981, 1984, 1991, 1996];
$deaths = [1991, 1984];
人口最多的那一年应该是1996
,因为那一年有3
人在世,是所有那些年中人口最多的一年。
这是 运行 的数学公式:
| Birth | Death | Population |
|-------|-------|------------|
| 1981 | | 1 |
| 1984 | | 2 |
| 1984 | 1984 | 2 |
| 1991 | 1991 | 2 |
| 1996 | | 3 |
假设
我们可以安全地假设某人出生的那一年人口会增加一,而某人去世的那一年人口会减少一。所以在这个例子中,1984年出生了2个人,1984年死了1个人,也就是说那一年人口增加了1。
我们还可以安全地假设死亡人数永远不会超过出生人数,并且当人口为 0 时不会发生死亡。
我们还可以安全地假设 $deaths
和 $births
中的年份永远不会是负值或浮点值(它们总是大于 0).
We cannot assume that the arrays will be sorted or that there won't be duplicate values, however.
要求
我们必须编写一个函数来 return 人口最多的那一年,将这两个数组作为输入。该函数可能 return 0
、false
、""
或 NULL
(任何虚假值都是可接受的)如果输入数组为空,或者总体始终为 0。如果最高人口出现在多个年份,函数可能 return 达到最高人口的第一年或任何后续年份。
例如:
$births = [1997, 1997, 1997, 1998, 1999];
$deaths = [1998, 1999];
/* The highest population was 3 on 1997, 1998 and 1999, either answer is correct */
此外,包含解决方案的 Big O 会有所帮助。
我对此的最佳尝试如下:
function highestPopulationYear(Array $births, Array $deaths): Int {
sort($births);
sort($deaths);
$nextBirthYear = reset($births);
$nextDeathYear = reset($deaths);
$years = [];
if ($nextBirthYear) {
$years[] = $nextBirthYear;
}
if ($nextDeathYear) {
$years[] = $nextDeathYear;
}
if ($years) {
$currentYear = max(0, ...$years);
} else {
$currentYear = 0;
}
$maxYear = $maxPopulation = $currentPopulation = 0;
while(current($births) !== false || current($deaths) !== false || $years) {
while($currentYear === $nextBirthYear) {
$currentPopulation++;
$nextBirthYear = next($births);
}
while($currentYear === $nextDeathYear) {
$currentPopulation--;
$nextDeathYear = next($deaths);
}
if ($currentPopulation >= $maxPopulation) {
$maxPopulation = $currentPopulation;
$maxYear = $currentYear;
}
$years = [];
if ($nextBirthYear) {
$years[] = $nextBirthYear;
}
if ($nextDeathYear) {
$years[] = $nextDeathYear;
}
if ($years) {
$currentYear = min($years);
} else {
$currentYear = 0;
}
}
return $maxYear;
}
上面的算法应该在多项式时间内工作,因为它是最坏的 O(((n log n) * 2) + k)
其中 n
是每个数组中要排序的元素数,k
是出生年份数(因为我们知道 k
总是 k >= y
)其中 y
是死亡年数。但是,我不确定是否有更有效的解决方案。
My interests are purely in an improved Big O of computational complexity upon the existing algorithm. Memory complexity is of no concern. Nor is the runtime optimization. At least it's not a primary concern. Any minor/major runtime optimizations are welcome, but not the key factor here.
我认为我们可以有 O(n log n)
时间和 O(1)
额外的 space 通过首先排序,然后在我们迭代时保持当前人口和全局最大值。我尝试使用当年作为参考点,但逻辑似乎仍然有点棘手,所以我不确定它是否已完全解决。希望它可以提供有关该方法的想法。
JavaScript代码(counterexamples/bugs欢迎)
function f(births, deaths){
births.sort((a, b) => a - b);
deaths.sort((a, b) => a - b);
console.log(JSON.stringify(births));
console.log(JSON.stringify(deaths));
let i = 0;
let j = 0;
let year = births[i];
let curr = 0;
let max = curr;
while (deaths[j] < births[0])
j++;
while (i < births.length || j < deaths.length){
while (year == births[i]){
curr = curr + 1;
i = i + 1;
}
if (j == deaths.length || year < deaths[j]){
max = Math.max(max, curr);
console.log(`year: ${ year }, max: ${ max }, curr: ${ curr }`);
} else if (j < deaths.length && deaths[j] == year){
while (deaths[j] == year){
curr = curr - 1;
j = j + 1;
}
max = Math.max(max, curr);
console.log(`year: ${ year }, max: ${ max }, curr: ${ curr }`);
}
if (j < deaths.length && deaths[j] > year && (i == births.length || deaths[j] < births[i])){
year = deaths[j];
while (deaths[j] == year){
curr = curr - 1;
j = j + 1;
}
console.log(`year: ${ year }, max: ${ max }, curr: ${ curr }`);
}
year = births[i];
}
return max;
}
var input = [
[[1997, 1997, 1997, 1998, 1999],
[1998, 1999]],
[[1, 2, 2, 3, 4],
[1, 2, 2, 5]],
[[1984, 1981, 1984, 1991, 1996],
[1991, 1984, 1997]],
[[1984, 1981, 1984, 1991, 1996],
[1991, 1982, 1984, 1997]]
]
for (let [births, deaths] of input)
console.log(f(births, deaths));
如果年份范围 m
的顺序为 n
,我们可以将每年的计数存储在该范围内,时间复杂度为 O(n)
。如果我们想要花哨的,我们也可以有 O(n * log log m)
时间复杂度,通过使用 Y-fast trie 允许在 O(log log m)
时间内进行后继查找。
我们可以用桶排序在线性时间内解决这个问题。假设输入的大小是n,年的范围是m。
O(n): Find the min and max year across births and deaths.
O(m): Create an array of size max_yr - min_yr + 1, ints initialized to zero.
Treat the first cell of the array as min_yr, the next as min_yr+1, etc...
O(n): Parse the births array, incrementing the appropriate index of the array.
arr[birth_yr - min_yr] += 1
O(n): Ditto for deaths, decrementing the appropriate index of the array.
arr[death_yr - min_yr] -= 1
O(m): Parse your array, keeping track of the cumulative sum and its max value.
最大的累积最大值就是你的答案。
运行的时间是O(n+m),额外需要的space是O(m)。
如果 m 是 O(n),则这是 n 的线性解;即,如果年份范围的增长速度不超过出生和死亡人数的增长速度。对于现实世界的数据,这几乎肯定是正确的。
$births = [1984, 1981, 1984, 1991, 1996];
$deaths = [1991, 1984];
$years = array_unique(array_merge($births, $deaths));
sort($years);
$increaseByYear = array_count_values($births);
$decreaseByYear = array_count_values($deaths);
$populationByYear = array();
foreach ($years as $year) {
$increase = $increaseByYear[$year] ?? 0;
$decrease = $decreaseByYear[$year] ?? 0;
$previousPopulationTally = end($populationByYear);
$populationByYear[$year] = $previousPopulationTally + $increase - $decrease;
}
$maxPopulation = max($populationByYear);
$maxPopulationYears = array_keys($populationByYear, $maxPopulation);
$maxPopulationByYear = array_fill_keys($maxPopulationYears, $maxPopulation);
print_r($maxPopulationByYear);
这将解释平局年份的可能性,以及某人死亡的年份与某人的出生年份不对应的情况。
首先将出生和死亡汇总到地图 (year => population change
) 中,按键排序,然后计算 运行 人口。
这应该大约是 O(2n + n log n)
,其中 n
是出生人数。
$births = [1984, 1981, 1984, 1991, 1996];
$deaths = [1991, 1984];
function highestPopulationYear(array $births, array $deaths): ?int
{
$indexed = [];
foreach ($births as $birth) {
$indexed[$birth] = ($indexed[$birth] ?? 0) + 1;
}
foreach ($deaths as $death) {
$indexed[$death] = ($indexed[$death] ?? 0) - 1;
}
ksort($indexed);
$maxYear = null;
$max = $current = 0;
foreach ($indexed as $year => $change) {
$current += $change;
if ($current >= $max) {
$max = $current;
$maxYear = $year;
}
}
return $maxYear;
}
var_dump(highestPopulationYear($births, $deaths));
我用 O(n+m)
的内存需求解决了这个问题 [在最坏的情况下,最好的情况下 O(n)
]
并且,时间复杂度为 O(n logn)
。
这里,n & m
是births
和deaths
数组的长度。
我不知道 PHP 或 java 脚本。我用 Java 实现了它,逻辑非常简单。但我相信我的想法也可以用这些语言实现。
技术细节:
我使用javaTreeMap
结构来存储出生和死亡记录。
TreeMap
插入数据 sorted (key based) 作为 (key, value) 对,这里 key 是年份,value 是出生和死亡的累计总和(死亡为负数)。
我们不需要插入最高出生年份之后发生的死亡值。
一旦 TreeMap 填充了出生和死亡记录,所有累积总和都会更新,并随着时间的推移存储最大人口和年份。
示例输入和输出:1
Births: [1909, 1919, 1904, 1911, 1908, 1908, 1903, 1901, 1914, 1911, 1900, 1919, 1900, 1908, 1906]
Deaths: [1910, 1911, 1912, 1911, 1914, 1914, 1913, 1915, 1914, 1915]
Year counts Births: {1900=2, 1901=1, 1903=1, 1904=1, 1906=1, 1908=3, 1909=1, 1911=2, 1914=1, 1919=2}
Year counts Birth-Deaths combined: {1900=2, 1901=1, 1903=1, 1904=1, 1906=1, 1908=3, 1909=1, 1910=-1, 1911=0, 1912=-1, 1913=-1, 1914=-2, 1915=-2, 1919=2}
Yearwise population: {1900=2, 1901=3, 1903=4, 1904=5, 1906=6, 1908=9, 1909=10, 1910=9, 1911=9, 1912=8, 1913=7, 1914=5, 1915=3, 1919=5}
maxPopulation: 10
yearOfMaxPopulation: 1909
示例输入和输出:2
Births: [1906, 1901, 1911, 1902, 1905, 1911, 1902, 1905, 1910, 1912, 1900, 1900, 1904, 1913, 1904]
Deaths: [1917, 1908, 1918, 1915, 1907, 1907, 1917, 1917, 1912, 1913, 1905, 1914]
Year counts Births: {1900=2, 1901=1, 1902=2, 1904=2, 1905=2, 1906=1, 1910=1, 1911=2, 1912=1, 1913=1}
Year counts Birth-Deaths combined: {1900=2, 1901=1, 1902=2, 1904=2, 1905=1, 1906=1, 1907=-2, 1908=-1, 1910=1, 1911=2, 1912=0, 1913=0}
Yearwise population: {1900=2, 1901=3, 1902=5, 1904=7, 1905=8, 1906=9, 1907=7, 1908=6, 1910=7, 1911=9, 1912=9, 1913=9}
maxPopulation: 9
yearOfMaxPopulation: 1906
此处,在最后一个出生年份 1913
之后发生的死亡 (1914 & later
) 根本不计算在内,这样可以避免不必要的计算。
对于总共 10 million
数据(出生和死亡合计)和超过 1000 years range
,程序大约需要 3 sec.
完成。
如果与 100 years range
相同大小的数据,则需要 1.3 sec
。
所有输入都是随机取的。
内存明智的做法是保持 currentPopulation
和 currentYear
计算。从对 $births
和 $deaths
数组进行排序开始是一个很好的点,因为冒泡排序不是那么繁重的任务,但允许偷工减料:
<?php
$births = [1997, 1999, 2000];
$deaths = [2000, 2001, 2001];
function highestPopulationYear(array $births, array $deaths): Int {
// sort takes time, but is neccesary for futher optimizations
sort($births);
sort($deaths);
// first death year is a first year where population might decrase
// sorfar max population
$currentYearComputing = $deaths[0];
// year before first death has potential of having the biggest population
$maxY = $currentYearComputing-1;
// calculating population at the begining of the year of first death, start maxPopulation
$population = $maxPop = count(array_splice($births, 0, array_search($deaths[0], $births)));
// instead of every time empty checks: `while(!empty($deaths) || !empty($births))`
// we can control a target time. It reserves a memory, but this slot is decreased
// every iteration.
$iterations = count($deaths) + count($births);
while($iterations > 0) {
while(current($births) === $currentYearComputing) {
$population++;
$iterations--;
array_shift($births); // decreasing memory usage
}
while(current($deaths) === $currentYearComputing) {
$population--;
$iterations--;
array_shift($deaths); // decreasing memory usage
}
if ($population > $maxPop) {
$maxPop = $population;
$maxY = $currentYearComputing;
}
// In $iterations we have a sum of birth/death events left. Assuming all
// are births, if this number added to currentPopulation will never exceed
// current maxPoint, we can break the loop and save some time at cost of
// some memory.
if ($maxPop >= ($population+$iterations)) {
break;
}
$currentYearComputing++;
}
return $maxY;
}
echo highestPopulationYear($births, $deaths);
不太热衷于深入 Big O 事情,留给你了。
此外,如果您重新发现 currentYearComputing
每个循环,您可以将循环更改为 if
语句并只留下一个循环。
while($iterations > 0) {
$changed = false;
if(current($births) === $currentYearComputing) {
// ...
$changed = array_shift($births); // decreasing memory usage
}
if(current($deaths) === $currentYearComputing) {
// ...
$changed = array_shift($deaths); // decreasing memory usage
}
if ($changed === false) {
$currentYearComputing++;
continue;
}
我对这个解决方案很满意,大O的复杂度是n + m
<?php
function getHighestPopulation($births, $deaths){
$max = [];
$currentMax = 0;
$tmpArray = [];
foreach($deaths as $key => $death){
if(!isset($tmpArray[$death])){
$tmpArray[$death] = 0;
}
$tmpArray[$death]--;
}
foreach($births as $k => $birth){
if(!isset($tmpArray[$birth])){
$tmpArray[$birth] = 0;
}
$tmpArray[$birth]++;
if($tmpArray[$birth] > $currentMax){
$max = [$birth];
$currentMax = $tmpArray[$birth];
} else if ($tmpArray[$birth] == $currentMax) {
$max[] = $birth;
}
}
return [$currentMax, $max];
}
$births = [1997, 1997, 1997, 1998, 1999];
$deaths = [1998, 1999];
print_r (getHighestPopulation($births, $deaths));
?>
解决您的问题的最简单明了的方法之一。
$births = [1909, 1919, 1904, 1911, 1908, 1908, 1903, 1901, 1914, 1911, 1900, 1919, 1900, 1908, 1906];
$deaths = [1910, 1911, 1912, 1911, 1914, 1914, 1913, 1915, 1914, 1915];
/* for generating 1 million records
for($i=1;$i<=1000000;$i++) {
$births[] = rand(1900, 2020);
$deaths[] = rand(1900, 2020);
}
*/
function highestPopulationYear(Array $births, Array $deaths): Int {
$start_time = microtime(true);
$population = array_count_values($births);
$deaths = array_count_values($deaths);
foreach ($deaths as $year => $death) {
$population[$year] = ($population[$year] ?? 0) - $death;
}
ksort($population, SORT_NUMERIC);
$cumulativeSum = $maxPopulation = $maxYear = 0;
foreach ($population as $year => &$number) {
$cumulativeSum += $number;
if($maxPopulation < $cumulativeSum) {
$maxPopulation = $cumulativeSum;
$maxYear = $year;
}
}
print " Execution time of function = ".((microtime(true) - $start_time)*1000)." milliseconds";
return $maxYear;
}
print highestPopulationYear($births, $deaths);
输出:
1909
复杂性:
O(m + log(n))
给定两个数组; $births
包含表明某人出生时间的出生年份列表,$deaths
包含表明某人死亡时间的死亡年份列表,我们如何找到人口最多的年份?
例如给定以下数组:
$births = [1984, 1981, 1984, 1991, 1996];
$deaths = [1991, 1984];
人口最多的那一年应该是1996
,因为那一年有3
人在世,是所有那些年中人口最多的一年。
这是 运行 的数学公式:
| Birth | Death | Population | |-------|-------|------------| | 1981 | | 1 | | 1984 | | 2 | | 1984 | 1984 | 2 | | 1991 | 1991 | 2 | | 1996 | | 3 |
假设
我们可以安全地假设某人出生的那一年人口会增加一,而某人去世的那一年人口会减少一。所以在这个例子中,1984年出生了2个人,1984年死了1个人,也就是说那一年人口增加了1。
我们还可以安全地假设死亡人数永远不会超过出生人数,并且当人口为 0 时不会发生死亡。
我们还可以安全地假设 $deaths
和 $births
中的年份永远不会是负值或浮点值(它们总是大于 0).
We cannot assume that the arrays will be sorted or that there won't be duplicate values, however.
要求
我们必须编写一个函数来 return 人口最多的那一年,将这两个数组作为输入。该函数可能 return 0
、false
、""
或 NULL
(任何虚假值都是可接受的)如果输入数组为空,或者总体始终为 0。如果最高人口出现在多个年份,函数可能 return 达到最高人口的第一年或任何后续年份。
例如:
$births = [1997, 1997, 1997, 1998, 1999];
$deaths = [1998, 1999];
/* The highest population was 3 on 1997, 1998 and 1999, either answer is correct */
此外,包含解决方案的 Big O 会有所帮助。
我对此的最佳尝试如下:
function highestPopulationYear(Array $births, Array $deaths): Int {
sort($births);
sort($deaths);
$nextBirthYear = reset($births);
$nextDeathYear = reset($deaths);
$years = [];
if ($nextBirthYear) {
$years[] = $nextBirthYear;
}
if ($nextDeathYear) {
$years[] = $nextDeathYear;
}
if ($years) {
$currentYear = max(0, ...$years);
} else {
$currentYear = 0;
}
$maxYear = $maxPopulation = $currentPopulation = 0;
while(current($births) !== false || current($deaths) !== false || $years) {
while($currentYear === $nextBirthYear) {
$currentPopulation++;
$nextBirthYear = next($births);
}
while($currentYear === $nextDeathYear) {
$currentPopulation--;
$nextDeathYear = next($deaths);
}
if ($currentPopulation >= $maxPopulation) {
$maxPopulation = $currentPopulation;
$maxYear = $currentYear;
}
$years = [];
if ($nextBirthYear) {
$years[] = $nextBirthYear;
}
if ($nextDeathYear) {
$years[] = $nextDeathYear;
}
if ($years) {
$currentYear = min($years);
} else {
$currentYear = 0;
}
}
return $maxYear;
}
上面的算法应该在多项式时间内工作,因为它是最坏的 O(((n log n) * 2) + k)
其中 n
是每个数组中要排序的元素数,k
是出生年份数(因为我们知道 k
总是 k >= y
)其中 y
是死亡年数。但是,我不确定是否有更有效的解决方案。
My interests are purely in an improved Big O of computational complexity upon the existing algorithm. Memory complexity is of no concern. Nor is the runtime optimization. At least it's not a primary concern. Any minor/major runtime optimizations are welcome, but not the key factor here.
我认为我们可以有 O(n log n)
时间和 O(1)
额外的 space 通过首先排序,然后在我们迭代时保持当前人口和全局最大值。我尝试使用当年作为参考点,但逻辑似乎仍然有点棘手,所以我不确定它是否已完全解决。希望它可以提供有关该方法的想法。
JavaScript代码(counterexamples/bugs欢迎)
function f(births, deaths){
births.sort((a, b) => a - b);
deaths.sort((a, b) => a - b);
console.log(JSON.stringify(births));
console.log(JSON.stringify(deaths));
let i = 0;
let j = 0;
let year = births[i];
let curr = 0;
let max = curr;
while (deaths[j] < births[0])
j++;
while (i < births.length || j < deaths.length){
while (year == births[i]){
curr = curr + 1;
i = i + 1;
}
if (j == deaths.length || year < deaths[j]){
max = Math.max(max, curr);
console.log(`year: ${ year }, max: ${ max }, curr: ${ curr }`);
} else if (j < deaths.length && deaths[j] == year){
while (deaths[j] == year){
curr = curr - 1;
j = j + 1;
}
max = Math.max(max, curr);
console.log(`year: ${ year }, max: ${ max }, curr: ${ curr }`);
}
if (j < deaths.length && deaths[j] > year && (i == births.length || deaths[j] < births[i])){
year = deaths[j];
while (deaths[j] == year){
curr = curr - 1;
j = j + 1;
}
console.log(`year: ${ year }, max: ${ max }, curr: ${ curr }`);
}
year = births[i];
}
return max;
}
var input = [
[[1997, 1997, 1997, 1998, 1999],
[1998, 1999]],
[[1, 2, 2, 3, 4],
[1, 2, 2, 5]],
[[1984, 1981, 1984, 1991, 1996],
[1991, 1984, 1997]],
[[1984, 1981, 1984, 1991, 1996],
[1991, 1982, 1984, 1997]]
]
for (let [births, deaths] of input)
console.log(f(births, deaths));
如果年份范围 m
的顺序为 n
,我们可以将每年的计数存储在该范围内,时间复杂度为 O(n)
。如果我们想要花哨的,我们也可以有 O(n * log log m)
时间复杂度,通过使用 Y-fast trie 允许在 O(log log m)
时间内进行后继查找。
我们可以用桶排序在线性时间内解决这个问题。假设输入的大小是n,年的范围是m。
O(n): Find the min and max year across births and deaths.
O(m): Create an array of size max_yr - min_yr + 1, ints initialized to zero.
Treat the first cell of the array as min_yr, the next as min_yr+1, etc...
O(n): Parse the births array, incrementing the appropriate index of the array.
arr[birth_yr - min_yr] += 1
O(n): Ditto for deaths, decrementing the appropriate index of the array.
arr[death_yr - min_yr] -= 1
O(m): Parse your array, keeping track of the cumulative sum and its max value.
最大的累积最大值就是你的答案。
运行的时间是O(n+m),额外需要的space是O(m)。
如果 m 是 O(n),则这是 n 的线性解;即,如果年份范围的增长速度不超过出生和死亡人数的增长速度。对于现实世界的数据,这几乎肯定是正确的。
$births = [1984, 1981, 1984, 1991, 1996];
$deaths = [1991, 1984];
$years = array_unique(array_merge($births, $deaths));
sort($years);
$increaseByYear = array_count_values($births);
$decreaseByYear = array_count_values($deaths);
$populationByYear = array();
foreach ($years as $year) {
$increase = $increaseByYear[$year] ?? 0;
$decrease = $decreaseByYear[$year] ?? 0;
$previousPopulationTally = end($populationByYear);
$populationByYear[$year] = $previousPopulationTally + $increase - $decrease;
}
$maxPopulation = max($populationByYear);
$maxPopulationYears = array_keys($populationByYear, $maxPopulation);
$maxPopulationByYear = array_fill_keys($maxPopulationYears, $maxPopulation);
print_r($maxPopulationByYear);
这将解释平局年份的可能性,以及某人死亡的年份与某人的出生年份不对应的情况。
首先将出生和死亡汇总到地图 (year => population change
) 中,按键排序,然后计算 运行 人口。
这应该大约是 O(2n + n log n)
,其中 n
是出生人数。
$births = [1984, 1981, 1984, 1991, 1996];
$deaths = [1991, 1984];
function highestPopulationYear(array $births, array $deaths): ?int
{
$indexed = [];
foreach ($births as $birth) {
$indexed[$birth] = ($indexed[$birth] ?? 0) + 1;
}
foreach ($deaths as $death) {
$indexed[$death] = ($indexed[$death] ?? 0) - 1;
}
ksort($indexed);
$maxYear = null;
$max = $current = 0;
foreach ($indexed as $year => $change) {
$current += $change;
if ($current >= $max) {
$max = $current;
$maxYear = $year;
}
}
return $maxYear;
}
var_dump(highestPopulationYear($births, $deaths));
我用 O(n+m)
的内存需求解决了这个问题 [在最坏的情况下,最好的情况下 O(n)
]
并且,时间复杂度为 O(n logn)
。
这里,n & m
是births
和deaths
数组的长度。
我不知道 PHP 或 java 脚本。我用 Java 实现了它,逻辑非常简单。但我相信我的想法也可以用这些语言实现。
技术细节:
我使用javaTreeMap
结构来存储出生和死亡记录。
TreeMap
插入数据 sorted (key based) 作为 (key, value) 对,这里 key 是年份,value 是出生和死亡的累计总和(死亡为负数)。
我们不需要插入最高出生年份之后发生的死亡值。
一旦 TreeMap 填充了出生和死亡记录,所有累积总和都会更新,并随着时间的推移存储最大人口和年份。
示例输入和输出:1
Births: [1909, 1919, 1904, 1911, 1908, 1908, 1903, 1901, 1914, 1911, 1900, 1919, 1900, 1908, 1906]
Deaths: [1910, 1911, 1912, 1911, 1914, 1914, 1913, 1915, 1914, 1915]
Year counts Births: {1900=2, 1901=1, 1903=1, 1904=1, 1906=1, 1908=3, 1909=1, 1911=2, 1914=1, 1919=2}
Year counts Birth-Deaths combined: {1900=2, 1901=1, 1903=1, 1904=1, 1906=1, 1908=3, 1909=1, 1910=-1, 1911=0, 1912=-1, 1913=-1, 1914=-2, 1915=-2, 1919=2}
Yearwise population: {1900=2, 1901=3, 1903=4, 1904=5, 1906=6, 1908=9, 1909=10, 1910=9, 1911=9, 1912=8, 1913=7, 1914=5, 1915=3, 1919=5}
maxPopulation: 10
yearOfMaxPopulation: 1909
示例输入和输出:2
Births: [1906, 1901, 1911, 1902, 1905, 1911, 1902, 1905, 1910, 1912, 1900, 1900, 1904, 1913, 1904]
Deaths: [1917, 1908, 1918, 1915, 1907, 1907, 1917, 1917, 1912, 1913, 1905, 1914]
Year counts Births: {1900=2, 1901=1, 1902=2, 1904=2, 1905=2, 1906=1, 1910=1, 1911=2, 1912=1, 1913=1}
Year counts Birth-Deaths combined: {1900=2, 1901=1, 1902=2, 1904=2, 1905=1, 1906=1, 1907=-2, 1908=-1, 1910=1, 1911=2, 1912=0, 1913=0}
Yearwise population: {1900=2, 1901=3, 1902=5, 1904=7, 1905=8, 1906=9, 1907=7, 1908=6, 1910=7, 1911=9, 1912=9, 1913=9}
maxPopulation: 9
yearOfMaxPopulation: 1906
此处,在最后一个出生年份 1913
之后发生的死亡 (1914 & later
) 根本不计算在内,这样可以避免不必要的计算。
对于总共 10 million
数据(出生和死亡合计)和超过 1000 years range
,程序大约需要 3 sec.
完成。
如果与 100 years range
相同大小的数据,则需要 1.3 sec
。
所有输入都是随机取的。
内存明智的做法是保持 currentPopulation
和 currentYear
计算。从对 $births
和 $deaths
数组进行排序开始是一个很好的点,因为冒泡排序不是那么繁重的任务,但允许偷工减料:
<?php
$births = [1997, 1999, 2000];
$deaths = [2000, 2001, 2001];
function highestPopulationYear(array $births, array $deaths): Int {
// sort takes time, but is neccesary for futher optimizations
sort($births);
sort($deaths);
// first death year is a first year where population might decrase
// sorfar max population
$currentYearComputing = $deaths[0];
// year before first death has potential of having the biggest population
$maxY = $currentYearComputing-1;
// calculating population at the begining of the year of first death, start maxPopulation
$population = $maxPop = count(array_splice($births, 0, array_search($deaths[0], $births)));
// instead of every time empty checks: `while(!empty($deaths) || !empty($births))`
// we can control a target time. It reserves a memory, but this slot is decreased
// every iteration.
$iterations = count($deaths) + count($births);
while($iterations > 0) {
while(current($births) === $currentYearComputing) {
$population++;
$iterations--;
array_shift($births); // decreasing memory usage
}
while(current($deaths) === $currentYearComputing) {
$population--;
$iterations--;
array_shift($deaths); // decreasing memory usage
}
if ($population > $maxPop) {
$maxPop = $population;
$maxY = $currentYearComputing;
}
// In $iterations we have a sum of birth/death events left. Assuming all
// are births, if this number added to currentPopulation will never exceed
// current maxPoint, we can break the loop and save some time at cost of
// some memory.
if ($maxPop >= ($population+$iterations)) {
break;
}
$currentYearComputing++;
}
return $maxY;
}
echo highestPopulationYear($births, $deaths);
不太热衷于深入 Big O 事情,留给你了。
此外,如果您重新发现 currentYearComputing
每个循环,您可以将循环更改为 if
语句并只留下一个循环。
while($iterations > 0) {
$changed = false;
if(current($births) === $currentYearComputing) {
// ...
$changed = array_shift($births); // decreasing memory usage
}
if(current($deaths) === $currentYearComputing) {
// ...
$changed = array_shift($deaths); // decreasing memory usage
}
if ($changed === false) {
$currentYearComputing++;
continue;
}
我对这个解决方案很满意,大O的复杂度是n + m
<?php
function getHighestPopulation($births, $deaths){
$max = [];
$currentMax = 0;
$tmpArray = [];
foreach($deaths as $key => $death){
if(!isset($tmpArray[$death])){
$tmpArray[$death] = 0;
}
$tmpArray[$death]--;
}
foreach($births as $k => $birth){
if(!isset($tmpArray[$birth])){
$tmpArray[$birth] = 0;
}
$tmpArray[$birth]++;
if($tmpArray[$birth] > $currentMax){
$max = [$birth];
$currentMax = $tmpArray[$birth];
} else if ($tmpArray[$birth] == $currentMax) {
$max[] = $birth;
}
}
return [$currentMax, $max];
}
$births = [1997, 1997, 1997, 1998, 1999];
$deaths = [1998, 1999];
print_r (getHighestPopulation($births, $deaths));
?>
解决您的问题的最简单明了的方法之一。
$births = [1909, 1919, 1904, 1911, 1908, 1908, 1903, 1901, 1914, 1911, 1900, 1919, 1900, 1908, 1906];
$deaths = [1910, 1911, 1912, 1911, 1914, 1914, 1913, 1915, 1914, 1915];
/* for generating 1 million records
for($i=1;$i<=1000000;$i++) {
$births[] = rand(1900, 2020);
$deaths[] = rand(1900, 2020);
}
*/
function highestPopulationYear(Array $births, Array $deaths): Int {
$start_time = microtime(true);
$population = array_count_values($births);
$deaths = array_count_values($deaths);
foreach ($deaths as $year => $death) {
$population[$year] = ($population[$year] ?? 0) - $death;
}
ksort($population, SORT_NUMERIC);
$cumulativeSum = $maxPopulation = $maxYear = 0;
foreach ($population as $year => &$number) {
$cumulativeSum += $number;
if($maxPopulation < $cumulativeSum) {
$maxPopulation = $cumulativeSum;
$maxYear = $year;
}
}
print " Execution time of function = ".((microtime(true) - $start_time)*1000)." milliseconds";
return $maxYear;
}
print highestPopulationYear($births, $deaths);
输出:
1909
复杂性:
O(m + log(n))