PULSE on github (link provided) RuntimeError: CUDA out of memory.... preventing the program "run.py" from executing

Question

（作为一名学生，我对此有点陌生，但做了相当多的研究，我已经走得很远了，我非常喜欢通过这个学习新东西！）

此问题针对项目脉冲 -> https://github.com/adamian98/pulse

如果您在页面上向下滚动一点，自述文件会提供比我更好的解释。它还将提供一个直接的“正确”路径来判断我的行为并使解决问题变得容易得多。

Objective: 运行程序使用 run.py 文件

问题：尽管有兼容的 gpu 和足够的 vram

，但我得到了“RuntimeError：CUDA 内存不足”

知识：说到编码，我几天前才开始，现在已经用了十几个小时的anaconda，创建环境很舒服。

我所做的是...（下面的列表是一个总结，具体细节在它之后）

安装anaconda
使用此 .yml 文件 -> https://github.com/leihuayi/pulse/blob/feature/docker/pulse.yml（它更改依赖项以适用于 windows，这就是为什么我需要获取与master github 页）创建新环境并安装所需的包。它非常有效！我只在尝试安装 dlib 时遇到错误，它似乎与很多软件包和我的 python 版本不兼容。
我安装了cuda toolkit 10.2，cmake 3.17.2，尝试直接安装dlib进入环境。错误在荣耀的火焰中吐出。 dlib 包似乎只需要一个不同的 .py 文件而不需要 run.py，所以我认为它可能与这个错误无关

日志在下面，我更详细地解释了我的过程

开始详细信息和日志：从这里到“详细信息 2”部分应该有足够的信息来解决，剩下的就是以防万一

运行内存不足的错误日志-->（执行“run.py”文件后）

Loading Synthesis Network
Loading Mapping Network
Running Mapping Network
Traceback (most recent call last):
File "C:\Users\micha\anaconda3\envs\Pulse1\pulse-master\run.py", line 58, in
model = PULSE(cache_dir=kwargs["cache_dir"])
File "C:\Users\micha\anaconda3\envs\Pulse1\pulse-master\PULSE.py", line 44, in init
latent_out = torch.nn.LeakyReLU(5)(mapping(latent))
File "C:\Users\micha\anaconda3\envs\pulse3\lib\site-packages\torch\nn\modules\module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "C:\Users\micha\anaconda3\envs\Pulse1\pulse-master\stylegan.py", line 233, in forward
x = super().forward(x)
File "C:\Users\micha\anaconda3\envs\pulse3\lib\site-packages\torch\nn\modules\container.py", line 100, in forward
input = module(input)
File "C:\Users\micha\anaconda3\envs\pulse3\lib\site-packages\torch\nn\modules\module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "C:\Users\micha\anaconda3\envs\Pulse1\pulse-master\stylegan.py", line 38, in forward
return F.linear(x, self.weight * self.w_mul, bias)
File "C:\Users\micha\anaconda3\envs\pulse3\lib\site-packages\torch\nn\functional.py", line 1610, in linear
ret = torch.addmm(bias, input, weight.t())
RuntimeError: CUDA out of memory. Tried to allocate 1.91 GiB (GPU 0; 6.00 GiB total capacity; 3.92 GiB already allocated; 744.91 MiB free; 3.93 GiB reserved in total by PyTorch)

错误日志结束。

NVIDIA-SMI LOG WHILE 运行（检查空闲内存）

C:\Users\micha>nvidia-smi --query-gpu=memory.free --format=csv --loop=1
memory.free [MiB]

5991 MiB 5991 MiB 5991 MiB 5991 MiB 5897 MiB 5781 MiB 5685 MiB 1643 MiB 5991 MiB 5991 MiB

程序停在 1643MiB

详情第 1 部分：

我有一个内存为 6GB 的 nvidia gpu，（根据项目创建者发布的其他日志-> 应该有足够的内存让它工作“我们运行我们的测试有 8GB内存，但我相信您也应该能够运行 4GB 的代码”-adamian98）。我正在尝试修复该错误并让 run.py 按预期工作。

这是在anaconda中使用numba -s命令的系统信息（包括硬件信息：gpu，我的windows版本，内存，python版本等）

启动 CUDA 信息

Hardware Information
Machine : AMD64
CPU Name : znver1
CPU Count : 16
Number of accessible CPUs : 16
List of accessible CPUs cores : 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
CFS Restrictions (CPUs worth of runtime) : None

CPU Features : 64bit adx aes avx avx2 bmi bmi2
clflushopt clwb clzero cmov cx16
cx8 f16c fma fsgsbase fxsr lzcnt
mmx movbe mwaitx pclmul popcnt
prfchw rdpid rdrnd rdseed sahf sha
sse sse2 sse3 sse4.1 sse4.2 sse4a
ssse3 wbnoinvd xsave xsavec
xsaveopt xsaves

Memory Total (MB) : 15789
Memory Available (MB) : 8421

OS Information
Platform Name : Windows-10-10.0.18362-SP0
Platform Release : 10
OS Name : Windows
OS Version : 10.0.18362
OS Specific Version : 10 10.0.18362 SP0 Multiprocessor Free
Libc Version : ?

Python Information
Python Compiler : MSC v.1916 64 bit (AMD64)
Python Implementation : CPython
Python Version : 3.8.5
Python Locale : en_CA.cp1252

LLVM Information
LLVM Version : 10.0.1

CUDA Information
CUDA Device Initialized : True
CUDA Driver Version : 10020
CUDA Detect Output:
Found 1 CUDA devices
id 0 b'GeForce GTX 1660 Ti with Max-Q Design' [SUPPORTED]
compute capability: 7.5
pci device id: 0
pci bus id: 1
Summary:
1/1 devices are supported

CUDA Librairies Test Output:
Finding cublas from
named cublas.dll
trying to open library... ERROR: failed to open cublas:
Could not find module 'cublas.dll' (or one of its dependencies). Try using the full path with constructor syntax.
Finding cusparse from
named cusparse.dll
trying to open library... ERROR: failed to open cusparse:
Could not find module 'cusparse.dll' (or one of its dependencies). Try using the full path with constructor syntax.
Finding cufft from
named cufft.dll
trying to open library... ERROR: failed to open cufft:
Could not find module 'cufft.dll' (or one of its dependencies). Try using the full path with constructor syntax.
Finding curand from
named curand.dll
trying to open library... ERROR: failed to open curand:
Could not find module 'curand.dll' (or one of its dependencies). Try using the full path with constructor syntax.
Finding nvvm from
named nvvm.dll
trying to open library... ERROR: failed to open nvvm:
Could not find module 'nvvm.dll' (or one of its dependencies). Try using the full path with constructor syntax.
Finding cudart from
named cudart.dll
trying to open library... ERROR: failed to open cudart:
Could not find module 'cudart.dll' (or one of its dependencies). Try using the full path with constructor syntax.
Finding libdevice from
searching for compute_20... ERROR: can't open libdevice for compute_20
searching for compute_30... ERROR: can't open libdevice for compute_30
searching for compute_35... ERROR: can't open libdevice for compute_35
searching for compute_50... ERROR: can't open libdevice for compute_50

ROC information
ROC Available : False
ROC Toolchains : None
HSA Agents Count : 0
HSA Agents:
None
HSA Discrete GPUs Count : 0
HSA Discrete GPUs : None

SVML Information
SVML State, config.USING_SVML : True
SVML Library Loaded : True
llvmlite Using SVML Patched LLVM : True
SVML Operational : True

Threading Layer Information
TBB Threading Layer Available : False
+--> Disabled due to Unknown import problem.
OpenMP Threading Layer Available : True
+-->Vendor: MS
Workqueue Threading Layer Available : True
+-->Workqueue imported successfully.

Numba Environment Variable Information
None found.

Conda Information
Conda Build : 3.20.5
Conda Env : 4.9.2
Conda Platform : win-64
Conda Python Version : 3.8.5.final.0
Conda Root Writable : True

Installed Packages
blas 1.0 mkl
ca-certificates 2020.12.8 haa95532_0
certifi 2020.12.5 py38haa95532_0
cffi 1.14.0 py38h7a1dbc1_0
chardet 3.0.4 py38haa95532_1003
cryptography 2.9.2 py38h7a1dbc1_0
cudatoolkit 10.2.89 h74a9793_1 anaconda
cycler 0.10.0 py38_0
freetype 2.9.1 ha9979f8_1
icc_rt 2019.0.0 h0cc432a_1
icu 58.2 ha925a31_3
idna 2.9 py_1
intel-openmp 2019.4 245
jpeg 9b hb83a4c4_2
kiwisolver 1.2.0 py38h74a9793_0
libcxx 7.0.0 h1ad3211_1002 conda-forge
libpng 1.6.37 h2a8f88b_0
libtiff 4.1.0 h56a325e_0
llvm-meta 7.0.0 0 conda-forge
m2-bash 4.3.042 5
m2-gcc-libs 5.3.0 4
m2-libedit 3.1 20150326
m2-libffi 3.2.1 2
m2-libreadline 6.3.008 8
m2-msys2-runtime 2.5.0.17080.65c939c 3
m2-ncurses 6.0.20160220 2
m2w64-gcc-libgfortran 5.3.0 6
m2w64-gcc-libs-core 5.3.0 7
m2w64-gmp 6.1.0 2
m2w64-libwinpthread-git 5.0.0.4634.697f757 2
matplotlib 3.1.3 py38_0
matplotlib-base 3.1.3 py38h64f37c6_0
mkl 2019.4 245
mkl-service 2.3.0 py38h196d8e1_0
mkl_fft 1.0.15 py38h14836fe_0
mkl_random 1.1.0 py38hf9181ef_0
msys2-conda-epoch 20160418 1
ninja 1.9.0 py38h74a9793_0
numpy 1.18.1 py38h93ca92e_0
numpy-base 1.18.1 py38hc3f5095_1
olefile 0.46 py_0
openssl 1.1.1i h2bbff1b_0
pandas 1.0.3 py38h47e9c7a_0
pillow 7.1.2 py38hcc1f983_0
pip 20.0.2 py38_3
powershell_shortcut 0.0.1 3
pycparser 2.20 py_2
pyopenssl 19.1.0 pyhd3eb1b0_1
pyparsing 2.4.7 py_0
pyqt 5.9.2 py38ha925a31_4
pysocks 1.7.1 py38haa95532_0
python 3.8.2 he1778fa_13
python-dateutil 2.8.1 py_0
pytorch 1.5.0 py3.8_cuda102_cudnn7_0 pytorch
pytz 2020.1 py_0
qt 5.9.7 vc14h73c81de_0
requests 2.23.0 py38_0
scipy 1.4.1 py38h9439919_0
setuptools 46.2.0 py38_0
sip 4.19.13 py38ha925a31_0
six 1.14.0 py38haa95532_0
sqlite 3.31.1 h2a8f88b_1
tk 8.6.8 hfa6e2cd_0
torchvision 0.6.0 py38_cu102 pytorch
tornado 6.0.4 py38he774522_1
urllib3 1.25.8 py38_0
vc 14.2 h21ff451_1
vs2015_runtime 14.27.29016 h5e58377_2
wheel 0.34.2 py38_0
win_inet_pton 1.1.0 py38haa95532_0
wincertstore 0.2 py38_0
xz 5.2.5 h62dcd97_0
zlib 1.2.11 h62dcd97_4
zstd 1.3.7 h508b16e_0

No errors reported.

Warning log
Warning (roc): Error initialising ROC: No ROC toolchains found.
Warning (roc): No HSA Agents found, encountered exception when searching: Error at driver init:

HSA is not currently supported on this platform (win32).`

结束 CUDA 信息

这个问题很可能不需要以下部分，以防万一：

详情第 2 部分（听起来 dlib（和 cmake）之前只需要对齐面部，因此理论上 run.py 不需要它，但我不是 100% 确定。我省略了包含 dlib 特定错误的部分):

我将 .yml 成功安装到我命名为“pulse3”的 anaconda 环境中，没有任何错误，还有 CUDA 工具包 10.2 和 cmake 3.17.2。 Dlib 是唯一让我在使用 python 版本 3.8.2.

时与 windows 上的其他软件包吐出一堆不兼容错误的麻烦。

Answer 1

根据新的日志证据，同时使用此脚本和 run.py 文件

 "C:\Users\micha>nvidia-smi --query-gpu=memory.free --format=csv --loop=1"

除了使用 gpuz 进行监控外，

以及在 anaconda 环境中执行 run.py 时产生的错误，除了 dlib 之外所有必需的包都已正确安装：

RuntimeError: CUDA out of memory. Tried to allocate 1.91 GiB (GPU 0; 6.00 GiB total capacity; 3.92 GiB already allocated; 744.91 MiB free; 3.93 GiB reserved in total by PyTorch)

导致 在 gtx 1660ti

上记录了最多但不超过 4.502GB VRAM 的使用情况

因此，我可以强烈推断项目 PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models (https://github.com/adamian98/pulse) 确实需要 6.01GB 之间的某处和 8.00GB 的 VRAM 来执行“run.py”文件，因此 6gb 的 gpu 是不够的

所以错误：

RuntimeError: CUDA out of memory. Tried to allocate 1.91 GiB (GPU 0; 6.00 GiB total capacity; 3.92 GiB already allocated; 744.91 MiB free; 3.93 GiB reserved in total by PyTorch)

很可能是由于硬件限制。

帮助改进 Pulse 项目的注释：

这个结果虽然没有定论，但与项目创建者的理论相矛盾：

“不幸的是，2GB 内存不足以存储优化期间所需的所有梯度。我们运行我们使用 8GB 内存进行测试，但我相信你应该也可以运行 4GB 的代码。” -adamian98

可能的错误因素（如果需要，请参阅相关日志以获取更多信息）：

1.needing 从 anaconda 环境外部执行 .py 文件（尽管该文件也在环境中）

2.lack dlib 包安装成功

3.the 输入图像大小不正确（它是 75x77 像素，并且是从项目示例中提取的）

4.the 输入图像格式不正确（was.png）

图片位置不正确\anaconda3\envs\pulse3\Library\qml\Qt3D （anaconda 环境被命名为 pulse3，这就是路径使用 pulse3 的原因）