3 个嵌套循环:优化简单模拟以提高速度
3 nested loops: Optimizing a simple simulation for speed
背景
我遇到了一个难题。在这里:
One day, an alien comes to Earth. Every day, each alien does one of four things, each with equal probability to:
- Kill himself
- Do nothing
- Split himself into two aliens (while killing himself)
- split himself into three aliens (while killing himself)
What is the probability that the alien species eventually dies out entirely?
Link to the source and the solution, problem #10
很遗憾,我没能从理论上解决问题。然后我继续用基本的马尔可夫链和 Monte Carlo 模拟来模拟它。
面试时没有问我这个问题。我从朋友那里得知了这个问题,然后在搜索数学解决方案时找到了上面的link。
重新解释问题
我们从外星人的数量开始n = 1
。 n
有机会不变,减1
,加1
,减2
,每次%25。如果 n
递增,即外星人倍增,我们再次重复此过程 n
次。这对应于每个外星人都会做一遍它的事情。不过,我必须设置一个上限,以便我们停止模拟并避免崩溃。 n
可能会增加,我们会一次又一次地循环 n
次。
如果外星人以某种方式灭绝,我们将再次停止模拟,因为没有什么可以模拟了。
在n
达到零或上限后,我们还记录人口(它将是零或某个数字>= max_pop
)。
我重复了很多次并记录了每一个结果。最后,零的数量除以结果总数应该给我一个近似值。
密码
from random import randint
import numpy as np
pop_max = 100
iter_max = 100000
results = np.zeros(iter_max, dtype=int)
for i in range(iter_max):
n = 1
while n > 0 and n < pop_max:
for j in range(n):
x = randint(1, 4)
if x == 1:
n = n - 1
elif x == 2:
continue
elif x == 3:
n = n + 1
elif x == 4:
n = n + 2
results[i] = n
print( np.bincount(results)[0] / iter_max )
iter_max
和pop_max
确实可以改,但是我觉得如果有100个外星人,他们灭绝的概率就可以忽略不计了。不过这只是一个猜测,我还没有做任何事情来计算(更多)适当的人口上限。
这段代码给出了有希望的结果,非常接近大约 %41.4 的真实答案。
一些输出
> python aliens.py
0.41393
> python aliens.py
0.41808
> python aliens.py
0.41574
> python aliens.py
0.4149
> python aliens.py
0.41505
> python aliens.py
0.41277
> python aliens.py
0.41428
> python aliens.py
0.41407
> python aliens.py
0.41676
后果
我对结果没意见,但这段代码所用的时间我不能说相同。大约需要 16-17 秒:)
如何提高速度?如何优化循环(尤其是 while
循环)?也许有更好的方法或更好的模型?
您可以通过使用 numpy(更快)一次生成 n
个随机整数来向量化您的内部循环,并使用算术而不是布尔逻辑摆脱所有 if 语句。
while...:
#population changes by (-1, 0, +1, +2) for each alien
n += np.random.randint(-1,3, size=n).sum()
对其他所有内容使用您的确切代码(您可能会在其他地方找到其他优化)我使用这一更改从 21.2 秒减少到 4.3 秒。
在不更改算法(即使用 monte carlo 以外的方法求解)的情况下,在您开始编译为机器代码(幸运的是如果你安装了 numba 就很容易了。
我不会提供有关 numba 执行的即时编译的完整教程,相反,我只会分享我的代码并记下我所做的更改:
from time import time
import numpy as np
from numpy.random import randint
from numba import njit, int32, prange
@njit('i4(i4)')
def simulate(pop_max): #move simulation of one population to a function for parallelization
n = 1
while 0 < n < pop_max:
n += np.sum(randint(-1,3,n))
return n
@njit('i4[:](i4,i4)', parallel=True)
def solve(pop_max, iter_max):
#this could be easily simplified to just return the raio of populations that die off vs survive to pop_max
# which would save you some ram (though the speed is about the same)
results = np.zeros(iter_max, dtype=int32) #numba needs int32 here rather than python int
for i in prange(iter_max): #prange specifies that this loop can be parallelized
results[i] = simulate(pop_max)
return results
pop_max = 100
iter_max = 100000
t = time()
print( np.bincount(solve(pop_max, iter_max))[0] / iter_max )
print('time elapsed: ', time()-t)
在我的系统上,并行化编译使计算速度降低到大约 0.15 秒。
没有 numpy 解决方案,100k 次模拟大约需要 5 秒:
from random import choices
def simulate_em():
def spwn(aliens):
return choices(range(-1,3), k=aliens)
aliens = {1:1}
i = 1
while aliens[i] > 0 and aliens[i] < 100:
i += 1
num = aliens[i-1]
aliens[i] = num + sum(spwn(num))
# commented for speed
# print(f"Round {i:<5} had {aliens[i]:>20} alien alive.")
return (i,aliens[i])
正在测试(在 pyfiddle.io 上大约 5 秒):
from datetime import datetime
t = datetime.now()
d = {}
wins = 0
test = 100000
for k in range(test):
d[k] = simulate_em()
wins += d[k][1]>=100
print(1-wins/test) # 0.41532
print(datetime.now()-t) # 0:00:04.840127
因此 10 万次测试运行大约需要 5 秒...
输出(2 次运行):
Round 1 had 1 alien alive.
Round 2 had 3 alien alive.
Round 3 had 6 alien alive.
Round 4 had 9 alien alive.
Round 5 had 7 alien alive.
Round 6 had 13 alien alive.
Round 7 had 23 alien alive.
Round 8 had 20 alien alive.
Round 9 had 37 alien alive.
Round 10 had 54 alien alive.
Round 11 had 77 alien alive.
Round 12 had 118 alien alive.
Round 1 had 1 alien alive.
Round 2 had 0 alien alive.
通过使用 amount_of_aliens
+ sum
而不是 choices(range(-1,3),k=amount_of_aliens)
,您可以更轻松地求和并更快地填写字典?如果您的外星人数量低于 0,它们就会灭绝。
背景
我遇到了一个难题。在这里:
One day, an alien comes to Earth. Every day, each alien does one of four things, each with equal probability to:
- Kill himself
- Do nothing
- Split himself into two aliens (while killing himself)
- split himself into three aliens (while killing himself)
What is the probability that the alien species eventually dies out entirely?
Link to the source and the solution, problem #10
很遗憾,我没能从理论上解决问题。然后我继续用基本的马尔可夫链和 Monte Carlo 模拟来模拟它。
面试时没有问我这个问题。我从朋友那里得知了这个问题,然后在搜索数学解决方案时找到了上面的link。
重新解释问题
我们从外星人的数量开始n = 1
。 n
有机会不变,减1
,加1
,减2
,每次%25。如果 n
递增,即外星人倍增,我们再次重复此过程 n
次。这对应于每个外星人都会做一遍它的事情。不过,我必须设置一个上限,以便我们停止模拟并避免崩溃。 n
可能会增加,我们会一次又一次地循环 n
次。
如果外星人以某种方式灭绝,我们将再次停止模拟,因为没有什么可以模拟了。
在n
达到零或上限后,我们还记录人口(它将是零或某个数字>= max_pop
)。
我重复了很多次并记录了每一个结果。最后,零的数量除以结果总数应该给我一个近似值。
密码
from random import randint
import numpy as np
pop_max = 100
iter_max = 100000
results = np.zeros(iter_max, dtype=int)
for i in range(iter_max):
n = 1
while n > 0 and n < pop_max:
for j in range(n):
x = randint(1, 4)
if x == 1:
n = n - 1
elif x == 2:
continue
elif x == 3:
n = n + 1
elif x == 4:
n = n + 2
results[i] = n
print( np.bincount(results)[0] / iter_max )
iter_max
和pop_max
确实可以改,但是我觉得如果有100个外星人,他们灭绝的概率就可以忽略不计了。不过这只是一个猜测,我还没有做任何事情来计算(更多)适当的人口上限。
这段代码给出了有希望的结果,非常接近大约 %41.4 的真实答案。
一些输出
> python aliens.py
0.41393
> python aliens.py
0.41808
> python aliens.py
0.41574
> python aliens.py
0.4149
> python aliens.py
0.41505
> python aliens.py
0.41277
> python aliens.py
0.41428
> python aliens.py
0.41407
> python aliens.py
0.41676
后果
我对结果没意见,但这段代码所用的时间我不能说相同。大约需要 16-17 秒:)
如何提高速度?如何优化循环(尤其是 while
循环)?也许有更好的方法或更好的模型?
您可以通过使用 numpy(更快)一次生成 n
个随机整数来向量化您的内部循环,并使用算术而不是布尔逻辑摆脱所有 if 语句。
while...:
#population changes by (-1, 0, +1, +2) for each alien
n += np.random.randint(-1,3, size=n).sum()
对其他所有内容使用您的确切代码(您可能会在其他地方找到其他优化)我使用这一更改从 21.2 秒减少到 4.3 秒。
在不更改算法(即使用 monte carlo 以外的方法求解)的情况下,在您开始编译为机器代码(幸运的是如果你安装了 numba 就很容易了。
我不会提供有关 numba 执行的即时编译的完整教程,相反,我只会分享我的代码并记下我所做的更改:
from time import time
import numpy as np
from numpy.random import randint
from numba import njit, int32, prange
@njit('i4(i4)')
def simulate(pop_max): #move simulation of one population to a function for parallelization
n = 1
while 0 < n < pop_max:
n += np.sum(randint(-1,3,n))
return n
@njit('i4[:](i4,i4)', parallel=True)
def solve(pop_max, iter_max):
#this could be easily simplified to just return the raio of populations that die off vs survive to pop_max
# which would save you some ram (though the speed is about the same)
results = np.zeros(iter_max, dtype=int32) #numba needs int32 here rather than python int
for i in prange(iter_max): #prange specifies that this loop can be parallelized
results[i] = simulate(pop_max)
return results
pop_max = 100
iter_max = 100000
t = time()
print( np.bincount(solve(pop_max, iter_max))[0] / iter_max )
print('time elapsed: ', time()-t)
在我的系统上,并行化编译使计算速度降低到大约 0.15 秒。
没有 numpy 解决方案,100k 次模拟大约需要 5 秒:
from random import choices
def simulate_em():
def spwn(aliens):
return choices(range(-1,3), k=aliens)
aliens = {1:1}
i = 1
while aliens[i] > 0 and aliens[i] < 100:
i += 1
num = aliens[i-1]
aliens[i] = num + sum(spwn(num))
# commented for speed
# print(f"Round {i:<5} had {aliens[i]:>20} alien alive.")
return (i,aliens[i])
正在测试(在 pyfiddle.io 上大约 5 秒):
from datetime import datetime
t = datetime.now()
d = {}
wins = 0
test = 100000
for k in range(test):
d[k] = simulate_em()
wins += d[k][1]>=100
print(1-wins/test) # 0.41532
print(datetime.now()-t) # 0:00:04.840127
因此 10 万次测试运行大约需要 5 秒...
输出(2 次运行):
Round 1 had 1 alien alive.
Round 2 had 3 alien alive.
Round 3 had 6 alien alive.
Round 4 had 9 alien alive.
Round 5 had 7 alien alive.
Round 6 had 13 alien alive.
Round 7 had 23 alien alive.
Round 8 had 20 alien alive.
Round 9 had 37 alien alive.
Round 10 had 54 alien alive.
Round 11 had 77 alien alive.
Round 12 had 118 alien alive.
Round 1 had 1 alien alive.
Round 2 had 0 alien alive.
通过使用 amount_of_aliens
+ sum
而不是 choices(range(-1,3),k=amount_of_aliens)
,您可以更轻松地求和并更快地填写字典?如果您的外星人数量低于 0,它们就会灭绝。