如何在 Ubuntu 14.04 x64 上安装 Theano,并配置它以使用 GPU?
How to install Theano on Ubuntu 14.04 x64, and configure it so that it uses the GPU?
我尝试按照 Easy Installation of an Optimized Theano on Current Ubuntu 上的说明进行操作,但它不起作用:每当我 运行 使用 GPU 的 Theano 脚本时,它都会给我错误消息:
CUDA is installed, but device gpu is not available (error: Unable to get the number of gpus available: no CUDA-capable device is detected)
更具体地说,按照链接网页中的说明,我执行了以下步骤:
# Install Theano
sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git
sudo pip install Theano
# Install Nvidia drivers and CUDA
sudo apt-get install nvidia-current
sudo apt-get install nvidia-cuda-toolkit
然后我重新启动并尝试 运行ning:
THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python gpu_test.py # gpu_test.py comes from http://deeplearning.net/software/theano/tutorial/using_gpu.html
但我得到:
f@f-Aurora-R4:~$ THEANO_FLAGS=’mode=FAST_RUN,device=gpu,floatX=float32,cuda.root=/usr/lib/nvidia-cuda-toolkit’ python gpu_test.py WARNING (theano.sandbox.cuda): CUDA is installed, but device gpu is not available (error: Unable to get the number of gpus available: no CUDA-capable device is detected) [Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)] Looping 1000 times took 2.199992 seconds Result is [ 1.23178029 1.61879337 1.52278066 ..., 2.20771813 2.29967761 1.62323284] Used the cpu
(我在 Ubuntu 14.04.4 LTS x64 和 Kubuntu 14.04.4 LTS x64 上测试了以下内容,我想它应该适用于大多数 Ubuntu 变体)
安装 Theano 并配置 GPU (CUDA)
官方网站上的说明已过时。相反,您可以使用以下说明(假设新安装的 Kubuntu 14.04 LTS x64):
# Install Theano
sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git
sudo pip install Theano
# Install Nvidia drivers, CUDA and CUDA toolkit, following some instructions from http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
wget http://developer.download.nvidia.com/compute/cuda/7.5/Prod/local_installers/cuda-repo-ubuntu1404-7-5-local_7.5-18_amd64.deb # Got the link at https://developer.nvidia.com/cuda-downloads
sudo dpkg -i cuda-repo-ubuntu1404-7-5-local_7.5-18_amd64.deb
sudo apt-get update
sudo apt-get install cuda
sudo reboot
那时,运行ning nvidia-smi
应该可以工作,但是 运行ning nvcc
不会工作。
# Execute in console, or (add in ~/.bash_profile then run "source ~/.bash_profile"):
export PATH=/usr/local/cuda-7.5/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-7.5/lib64:$LD_LIBRARY_PATH
到那时,nvidia-smi
和 nvcc
都应该可以工作了。
测试Theano是否可以使用GPU:
将以下内容复制并粘贴到 gpu_test.py
中:
# Start gpu_test.py
# From http://deeplearning.net/software/theano/tutorial/using_gpu.html#using-gpu
from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time
vlen = 10 * 30 * 768 # 10 x #cores x # threads per core
iters = 1000
rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in xrange(iters):
r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
print('Used the cpu')
else:
print('Used the gpu')
# End gpu_test.py
和运行它:
THEANO_FLAGS='mode=FAST_RUN,device=gpu,floatX=float32' python gpu_test.py
哪个应该 return:
f@f-Aurora-R4:~$ THEANO_FLAGS='mode=FAST_RUN,device=gpu,floatX=float32' python gpu_test.py
Using gpu device 0: GeForce GTX 690
[GpuElemwise{exp,no_inplace}(<CudaNdarrayType(float32, vector)>), HostFromGpu(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took 0.658292 seconds
Result is [ 1.23178029 1.61879349 1.52278066 ..., 2.20771813 2.29967761
1.62323296]
Used the gpu
要了解您的 CUDA 版本:
nvcc -V
示例:
username@server:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17
添加 cuDNN
要添加 cuDNN(来自 http://deeplearning.net/software/theano/library/sandbox/cuda/dnn.html 的说明):
- 从 https://developer.nvidia.com/rdp/cudnn-download 下载 cuDNN(需要注册,免费)
tar -xvf cudnn-7.0-linux-x64-v3.0-prod.tgz
- 执行以下操作之一
选项 1:将 *.h
文件复制到 CUDA_ROOT/include
,将 *.so*
文件复制到 CUDA_ROOT/lib64
(默认情况下,CUDA_ROOT
是 /usr/local/cuda
在 Linux).
sudo cp cuda/lib64/* /usr/local/cuda/lib64/
sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
选项 2:
export LD_LIBRARY_PATH=/home/user/path_to_CUDNN_folder/lib64:$LD_LIBRARY_PATH
export CPATH=/home/user/path_to_CUDNN_folder/include:$CPATH
export LIBRARY_PATH=/home/user/path_to_CUDNN_folder/lib64:$LD_LIBRARY_PATH
默认情况下,Theano 会检测是否可以使用 cuDNN。如果是这样,它将使用它。如果不是,Theano 优化将不会引入 cuDNN 操作。所以如果用户没有手动引入它们,Theano 仍然可以工作。
如果 Theano 不能使用 cuDNN,要得到一个错误,使用这个 Theano 标志:optimizer_including=cudnn
.
示例:
THEANO_FLAGS='mode=FAST_RUN,device=gpu,floatX=float32,optimizer_including=cudnn' python gpu_test.py
要了解您的 cuDNN 版本:
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
正在添加 CNMeM
CNMeM library 是 "Simple library to help the Deep Learning frameworks manage CUDA memory."。
# Build CNMeM without the unit tests
git clone https://github.com/NVIDIA/cnmem.git cnmem
cd cnmem
mkdir build
cd build
sudo apt-get install -y cmake
cmake ..
make
# Copy files to proper location
sudo cp ../include/cnmem.h /usr/local/cuda/include
sudo cp *.so /usr/local/cuda/lib64/
cd ../..
要与 Theano 一起使用,您需要添加 lib.cnmem
标志。示例:
THEANO_FLAGS='mode=FAST_RUN,device=gpu,floatX=float32,lib.cnmem=0.8,optimizer_including=cudnn' python gpu_test.py
脚本的第一个输出应该是:
Using gpu device 0: GeForce GTX TITAN X (CNMeM is enabled with initial size: 80.0% of memory, cuDNN 5005)
lib.cnmem=0.8
表示它最多可以使用80%的GPU。
据报道,CNMeM 提供了一些有趣的速度改进,并得到 Theano、Torch 和 Caffee 的支持。
The speed up depend of many factor, like the shapes and the model itself. The speed up go from 0 to 2x faster.
If you don't change the Theano flag allow_gc, you can expect 20% speed up on the GPU. In some case (small models), we saw a 50% speed up.
运行 Theano 在多个 CPU 核上
附带说明一下,您可以 运行 Theano 在多个 CPU 内核上使用 OMP_NUM_THREADS=[number_of_cpu_cores]
flag。示例:
OMP_NUM_THREADS=4 python gpu_test.py
脚本 theano/misc/check_blas.py
输出有关使用哪个 BLAS 的信息:
cd [theano_git_directory]
OMP_NUM_THREADS=4 python theano/misc/check_blas.py
到 运行 Theano 的测试套件:
nosetests theano
或
sudo pip install nose-parameterized
import theano
theano.test()
常见问题:
我尝试按照 Easy Installation of an Optimized Theano on Current Ubuntu 上的说明进行操作,但它不起作用:每当我 运行 使用 GPU 的 Theano 脚本时,它都会给我错误消息:
CUDA is installed, but device gpu is not available (error: Unable to get the number of gpus available: no CUDA-capable device is detected)
更具体地说,按照链接网页中的说明,我执行了以下步骤:
# Install Theano
sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git
sudo pip install Theano
# Install Nvidia drivers and CUDA
sudo apt-get install nvidia-current
sudo apt-get install nvidia-cuda-toolkit
然后我重新启动并尝试 运行ning:
THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python gpu_test.py # gpu_test.py comes from http://deeplearning.net/software/theano/tutorial/using_gpu.html
但我得到:
f@f-Aurora-R4:~$ THEANO_FLAGS=’mode=FAST_RUN,device=gpu,floatX=float32,cuda.root=/usr/lib/nvidia-cuda-toolkit’ python gpu_test.py WARNING (theano.sandbox.cuda): CUDA is installed, but device gpu is not available (error: Unable to get the number of gpus available: no CUDA-capable device is detected) [Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)] Looping 1000 times took 2.199992 seconds Result is [ 1.23178029 1.61879337 1.52278066 ..., 2.20771813 2.29967761 1.62323284] Used the cpu
(我在 Ubuntu 14.04.4 LTS x64 和 Kubuntu 14.04.4 LTS x64 上测试了以下内容,我想它应该适用于大多数 Ubuntu 变体)
安装 Theano 并配置 GPU (CUDA)
官方网站上的说明已过时。相反,您可以使用以下说明(假设新安装的 Kubuntu 14.04 LTS x64):
# Install Theano
sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git
sudo pip install Theano
# Install Nvidia drivers, CUDA and CUDA toolkit, following some instructions from http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
wget http://developer.download.nvidia.com/compute/cuda/7.5/Prod/local_installers/cuda-repo-ubuntu1404-7-5-local_7.5-18_amd64.deb # Got the link at https://developer.nvidia.com/cuda-downloads
sudo dpkg -i cuda-repo-ubuntu1404-7-5-local_7.5-18_amd64.deb
sudo apt-get update
sudo apt-get install cuda
sudo reboot
那时,运行ning nvidia-smi
应该可以工作,但是 运行ning nvcc
不会工作。
# Execute in console, or (add in ~/.bash_profile then run "source ~/.bash_profile"):
export PATH=/usr/local/cuda-7.5/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-7.5/lib64:$LD_LIBRARY_PATH
到那时,nvidia-smi
和 nvcc
都应该可以工作了。
测试Theano是否可以使用GPU:
将以下内容复制并粘贴到 gpu_test.py
中:
# Start gpu_test.py
# From http://deeplearning.net/software/theano/tutorial/using_gpu.html#using-gpu
from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time
vlen = 10 * 30 * 768 # 10 x #cores x # threads per core
iters = 1000
rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in xrange(iters):
r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
print('Used the cpu')
else:
print('Used the gpu')
# End gpu_test.py
和运行它:
THEANO_FLAGS='mode=FAST_RUN,device=gpu,floatX=float32' python gpu_test.py
哪个应该 return:
f@f-Aurora-R4:~$ THEANO_FLAGS='mode=FAST_RUN,device=gpu,floatX=float32' python gpu_test.py
Using gpu device 0: GeForce GTX 690
[GpuElemwise{exp,no_inplace}(<CudaNdarrayType(float32, vector)>), HostFromGpu(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took 0.658292 seconds
Result is [ 1.23178029 1.61879349 1.52278066 ..., 2.20771813 2.29967761
1.62323296]
Used the gpu
要了解您的 CUDA 版本:
nvcc -V
示例:
username@server:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17
添加 cuDNN
要添加 cuDNN(来自 http://deeplearning.net/software/theano/library/sandbox/cuda/dnn.html 的说明):
- 从 https://developer.nvidia.com/rdp/cudnn-download 下载 cuDNN(需要注册,免费)
tar -xvf cudnn-7.0-linux-x64-v3.0-prod.tgz
- 执行以下操作之一
选项 1:将 *.h
文件复制到 CUDA_ROOT/include
,将 *.so*
文件复制到 CUDA_ROOT/lib64
(默认情况下,CUDA_ROOT
是 /usr/local/cuda
在 Linux).
sudo cp cuda/lib64/* /usr/local/cuda/lib64/
sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
选项 2:
export LD_LIBRARY_PATH=/home/user/path_to_CUDNN_folder/lib64:$LD_LIBRARY_PATH
export CPATH=/home/user/path_to_CUDNN_folder/include:$CPATH
export LIBRARY_PATH=/home/user/path_to_CUDNN_folder/lib64:$LD_LIBRARY_PATH
默认情况下,Theano 会检测是否可以使用 cuDNN。如果是这样,它将使用它。如果不是,Theano 优化将不会引入 cuDNN 操作。所以如果用户没有手动引入它们,Theano 仍然可以工作。
如果 Theano 不能使用 cuDNN,要得到一个错误,使用这个 Theano 标志:optimizer_including=cudnn
.
示例:
THEANO_FLAGS='mode=FAST_RUN,device=gpu,floatX=float32,optimizer_including=cudnn' python gpu_test.py
要了解您的 cuDNN 版本:
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
正在添加 CNMeM
CNMeM library 是 "Simple library to help the Deep Learning frameworks manage CUDA memory."。
# Build CNMeM without the unit tests
git clone https://github.com/NVIDIA/cnmem.git cnmem
cd cnmem
mkdir build
cd build
sudo apt-get install -y cmake
cmake ..
make
# Copy files to proper location
sudo cp ../include/cnmem.h /usr/local/cuda/include
sudo cp *.so /usr/local/cuda/lib64/
cd ../..
要与 Theano 一起使用,您需要添加 lib.cnmem
标志。示例:
THEANO_FLAGS='mode=FAST_RUN,device=gpu,floatX=float32,lib.cnmem=0.8,optimizer_including=cudnn' python gpu_test.py
脚本的第一个输出应该是:
Using gpu device 0: GeForce GTX TITAN X (CNMeM is enabled with initial size: 80.0% of memory, cuDNN 5005)
lib.cnmem=0.8
表示它最多可以使用80%的GPU。
据报道,CNMeM 提供了一些有趣的速度改进,并得到 Theano、Torch 和 Caffee 的支持。
The speed up depend of many factor, like the shapes and the model itself. The speed up go from 0 to 2x faster.
If you don't change the Theano flag allow_gc, you can expect 20% speed up on the GPU. In some case (small models), we saw a 50% speed up.
运行 Theano 在多个 CPU 核上
附带说明一下,您可以 运行 Theano 在多个 CPU 内核上使用 OMP_NUM_THREADS=[number_of_cpu_cores]
flag。示例:
OMP_NUM_THREADS=4 python gpu_test.py
脚本 theano/misc/check_blas.py
输出有关使用哪个 BLAS 的信息:
cd [theano_git_directory]
OMP_NUM_THREADS=4 python theano/misc/check_blas.py
到 运行 Theano 的测试套件:
nosetests theano
或
sudo pip install nose-parameterized
import theano
theano.test()
常见问题: