cupy 函数的第一个和后续运行的执行时间差异很大

Question

当我在 cupy 数组上运行 cupy 函数时，函数的第一次调用比第二个运行花费的时间明显长，即使我运行它在不同的数组上第二次.

这是为什么？

import cupy as cp
cp.__version__
# 7.5.0

A = cp.random.random((1024, 1024))
B = cp.random.random((1024, 1024))

from time import time
def test(func, *args):
    t = time()
    func(*args)
    print("{}".format(round(time() - t, 4)))
    
test(cp.fft.fft2, A)
test(cp.fft.fft2, B)
# 0.129
# 0.001
test(cp.matmul, A, A.T)
test(cp.matmul, B, B.T)
# 0.171
# 0.0
test(cp.linalg.inv, A)
test(cp.linalg.inv, B)
# 0.259
# 0.002

Answer 1

CuPy 会在您第一次在 Python 进程中使用函数时在后台实时编译内核，这需要一些时间。

来自 CuPy documentation:

CuPy uses on-the-fly kernel synthesis: when a kernel call is required, it compiles a kernel code optimized for the shapes and dtypes of given arguments, sends it to the GPU device, and executes the kernel. The compiled code is cached to $(HOME)/.cupy/kernel_cache directory (this cache path can be overwritten by setting the CUPY_CACHE_DIR environment variable). It may make things slower at the first kernel call, though this slow down will be resolved at the second execution. CuPy also caches the kernel code sent to GPU device within the process, which reduces the kernel transfer time on further calls.

Answer 2

根据cupy user guide:

Context Initialization: It may take several seconds when calling a CuPy function for the first time in a process. This is because the CUDA driver creates a CUDA context during the first CUDA API call in CUDA applications.

cupy 函数的第一个和后续 运行 的执行时间差异很大

Big difference in execution time for first and subsequent run of cupy functions

cupy

cupy 函数的第一个和后续运行的执行时间差异很大