使用 cupy 时内存不足
out of memory when using cupy
当我使用 cupy 处理一些大数组时,出现内存不足错误,但是当我检查 nvidia-smi 以查看内存使用情况时,它没有达到我的 GPU 内存的限制,我用的是nvidia geforce RTX 2060,显存是6GB,这是我的代码:
import cupy as cp
mempool = cp.get_default_memory_pool()
print(mempool.used_bytes()) # 0
print(mempool.total_bytes()) # 0
a = cp.random.randint(0, 256, (10980, 10980)).astype(cp.uint8)
a = a.ravel()
print(a.nbytes) # 120560400
print(mempool.used_bytes()) # 120560640
print(mempool.total_bytes()) # 602803712
# when I finish create this array, the nvidia-smi shows like this
#+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.86 Driver Version: 430.86 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2060 WDDM | 00000000:01:00.0 On | N/A |
| N/A 46C P8 9W / N/A | 1280MiB / 6144MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
# but then I run this command, and error cames out
s_values, s_idx, s_counts = cp.unique(
a, return_inverse=True, return_counts=True)
# and the error shows
# cupy.cuda.memory.OutOfMemoryError: out of memory to allocate 964483584 bytes (total 5545867264 bytes)
# the nvidia-smi shows
# +-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.86 Driver Version: 430.86 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2060 WDDM | 00000000:01:00.0 On | N/A |
| N/A 45C P8 9W / N/A | 5075MiB / 6144MiB | 3% Default |
+-------------------------------+----------------------+----------------------+
好像有足够的space可以使用,为什么会出现这个错误,是因为我的GPU没有足够的内存,还是因为我的代码错误或者我没有分配记忆正确。
964,483,584 不是比您的 mempool.total_bytes()
602,803,712 大吗?
如评论中所述,您可以分批进行,而不是一次完成整个计算。
您可以使用 dask 来执行与它代表您执行并行化相同的操作,而且即使数据不适合 RAM,您也永远不会 运行 内存不足。
我附上 link 作者自己提供了如何做的解释。
from dask.distributed import Client,LocalCluster
import dask.array as da
import numpy as np
cluster = LocalCluster() #using multiple CPUs in the machine/cluster
client = Client(cluster)
client
rs = da.random.RandomState(RandomState=np.random.RandomState)
x = rs.random((100000,40000),chunks=(10000,400)) #29.80GB of ndarray
x #just ensure that the chunk size is small #30.52MB chunk
da.exp(x).mean().compute() #do not try to return ndarray with element-wise transformation, instead always try to get the reduced form.
da.exp(x) # Do not run this line as it will lead to
在最后一行,dask 尝试将输出保存在内存中。由于输出的数量级为 29+GB,因此您将 运行 内存不足。
Youtube link for the explanation of the above code by the author of dask
当我使用 cupy 处理一些大数组时,出现内存不足错误,但是当我检查 nvidia-smi 以查看内存使用情况时,它没有达到我的 GPU 内存的限制,我用的是nvidia geforce RTX 2060,显存是6GB,这是我的代码:
import cupy as cp
mempool = cp.get_default_memory_pool()
print(mempool.used_bytes()) # 0
print(mempool.total_bytes()) # 0
a = cp.random.randint(0, 256, (10980, 10980)).astype(cp.uint8)
a = a.ravel()
print(a.nbytes) # 120560400
print(mempool.used_bytes()) # 120560640
print(mempool.total_bytes()) # 602803712
# when I finish create this array, the nvidia-smi shows like this
#+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.86 Driver Version: 430.86 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2060 WDDM | 00000000:01:00.0 On | N/A |
| N/A 46C P8 9W / N/A | 1280MiB / 6144MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
# but then I run this command, and error cames out
s_values, s_idx, s_counts = cp.unique(
a, return_inverse=True, return_counts=True)
# and the error shows
# cupy.cuda.memory.OutOfMemoryError: out of memory to allocate 964483584 bytes (total 5545867264 bytes)
# the nvidia-smi shows
# +-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.86 Driver Version: 430.86 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2060 WDDM | 00000000:01:00.0 On | N/A |
| N/A 45C P8 9W / N/A | 5075MiB / 6144MiB | 3% Default |
+-------------------------------+----------------------+----------------------+
好像有足够的space可以使用,为什么会出现这个错误,是因为我的GPU没有足够的内存,还是因为我的代码错误或者我没有分配记忆正确。
964,483,584 不是比您的 mempool.total_bytes()
602,803,712 大吗?
如评论中所述,您可以分批进行,而不是一次完成整个计算。
您可以使用 dask 来执行与它代表您执行并行化相同的操作,而且即使数据不适合 RAM,您也永远不会 运行 内存不足。 我附上 link 作者自己提供了如何做的解释。
from dask.distributed import Client,LocalCluster
import dask.array as da
import numpy as np
cluster = LocalCluster() #using multiple CPUs in the machine/cluster
client = Client(cluster)
client
rs = da.random.RandomState(RandomState=np.random.RandomState)
x = rs.random((100000,40000),chunks=(10000,400)) #29.80GB of ndarray
x #just ensure that the chunk size is small #30.52MB chunk
da.exp(x).mean().compute() #do not try to return ndarray with element-wise transformation, instead always try to get the reduced form.
da.exp(x) # Do not run this line as it will lead to
在最后一行,dask 尝试将输出保存在内存中。由于输出的数量级为 29+GB,因此您将 运行 内存不足。 Youtube link for the explanation of the above code by the author of dask