在python中多次调用函数的计算时间有什么办法可以加快吗？

Question

    import numpy as np
    import matplotlib.pyplot as plt
    from numpy import random
    import time
    from collections import Counter
   
    def simulation(N):
        I = 10
        success = 0
        M = 100
        for i in range(I):
            s = allocate(N,M)
            M -= s
            success += s
        return success

    def allocate(N,M):
        count = Counter(random.randint(N,size = M))
        success = sum(j for v,j in count.items() if j == 1)
        return success

    if __name__ == "__main__":
        start = time.perf_counter() 
        SAMPLE_SIZE = 100000
        N = np.linspace(5,45,41).astype(int)
        Ps = []
        for n in N:
            ps = []
            for _ in range(SAMPLE_SIZE):
                ps.append(simulation(n)/100)
            result = np.average(np.array(ps))
            Ps.append(result)

        elapsed = (time.perf_counter() - start)
        print("Time used:",elapsed)
        plt.scatter(N,Ps)
        plt.show()

这是我的情况。最终目标是将 SAMPLE_SIZE 设置为 10^7。但是，当我将它设置为 10^5 时，它已经需要大约 1000 秒才能运行它。有什么办法可以让它更高效更快？谢谢你给我建议。

Answer 1

首先，allocate 的实现效率不是很高：您可以使用矢量化 Numpy 函数 来做到这一点：

def allocate(N, M):
    success = np.count_nonzero(np.bincount(random.randint(N, size=M)) == 1)
    return success

大多数情况下，Numpy 函数执行一些检查和创建一些临时数组的开销。您可以使用 Numba 来解决这个问题：

import numba as nb

@nb.njit('int_(int_, int_)')
def allocate(N,M):
    tmp = np.zeros(N, np.int_)
    for i in range(M):
        rnd = np.random.randint(0, N)
        tmp[rnd] += 1
    count = 0
    for i in range(N):
        count += tmp[i] == 1
    return count

然后，您可以通过对 simulation 函数使用 Numba 装饰器 @nb.njit('int_(int_)') 来进一步加快代码速度，从而避免从 C[=43= 调用 Numba 函数的开销] 口译员。

最后，您可以通过运行与 Numba 并行来加速主循环（同时避免使用 慢速列表).您还可以回收 tmp 数组，以免造成太多分配（这很昂贵并且不随内核数量扩展）。这是生成的最终代码：

import numpy as np
import matplotlib.pyplot as plt
import time
import numba as nb

# Recycle the `tmp` buffer not to do many allocations
@nb.njit('int_(int_, int_, int_[::1])')
def allocate(N, M, tmp):
    tmp.fill(0)
    for i in range(M):
        rnd = np.random.randint(0, N)
        tmp[rnd] += 1
    count = 0
    for i in range(N):
        count += tmp[i] == 1
    return count

@nb.njit('int_(int_)')
def simulation(N):
    I = 10
    success = 0
    M = 100
    tmp = np.zeros(N, np.int_) # Preallocated buffer
    for i in range(I):
        s = allocate(N, M, tmp)
        M -= s
        success += s
    return success

@nb.njit('float64(int_, int_)', parallel=True)
def compute_ps_avg(n, sample_size):
    ps = np.zeros(sample_size, dtype=np.float64)
    for i in nb.prange(sample_size):
        ps[i] = simulation(n) / 100.0
    # Note that np.average is not yet supported by Numba
    return np.mean(ps)

if __name__ == "__main__":
    start = time.perf_counter() 
    SAMPLE_SIZE = 100_000
    N = np.linspace(5,45,41).astype(int)
    Ps = [compute_ps_avg(n, SAMPLE_SIZE) for n in N]
    elapsed = (time.perf_counter() - start)
    print("Time used:",elapsed)
    plt.scatter(N,Ps)
    plt.show()

以下是我的 10 核机器上的性能结果：

Initial code:          670.6 s
Optimized Numba code:    3.9 s

生成的代码快 172 倍。

80%以上的时间花在了随机数的生成上。因此，如果您希望代码更快，一种解决方案是使用 SIMD-optimized 随机数生成器来加快随机数的生成。不幸的是，AFAIK，这不可能（有效地）在 Python 中实现。你当然需要使用像 C 或 C++ 这样的本地语言来做到这一点。

Answer 2

我可能漏掉了重点，但您似乎可以用封闭式计算代替您的模拟。

我相信您要解决的问题是在给定 M 个球随机分布在 N 个盒子中的情况下，找到其中恰好有 1 个球的盒子的预期数量。

按照这里的答案https://math.stackexchange.com/a/66094得到封闭形式的表达式（André Nicolas 用 10 个球和 5 个盒子解决了这个问题，但你应该可以推断）

（作为旁注，我通常也会编写代码来确认我的概率计算是正确的，如果这是你正在做的事情，我很抱歉说出显而易见的事情：P）

在python中多次调用函数的计算时间有什么办法可以加快吗？

Is there any way to speed up the computation time of calling a function with multiple time in python?

python

performance

review