无法插入 'nvidia_352': 没有这样的设备

Could not insert 'nvidia_352': No such device

我正在尝试 运行 caffe Linux Ubuntu。 安装后,我在gpu中运行 caffe,报错

I0910 13:28:13.606891 10629 caffe.cpp:296] Use GPU with device ID 0
modprobe: ERROR: could not insert 'nvidia_352': No such device
F0910 13:28:13.728612 10629 common.cpp:142] Check failed: error == cudaSuccess (38 vs. 0)  no CUDA-capable device is detected
*** Check failure stack trace: ***
    @     0x7ffd3b9a7daa  (unknown)
    @     0x7ffd3b9a7ce4  (unknown)
    @     0x7ffd3b9a76e6  (unknown)
    @     0x7ffd3b9aa687  (unknown)
    @     0x7ffd3bf91cb5  caffe::Caffe::SetDevice()
    @           0x40a5a7  time()
    @           0x4080f8  main
    @     0x7ffd3aeb9ec5  (unknown)
    @           0x408618  (unknown)
    @              (nil)  (unknown)
Aborted (core dumped)

我的 NVIDIA 驱动程序是 352.41。 我安装的是352,安装的是最新版本。

sudo apt-get install nvidia-352[sudo] 
Reading package lists... Done
Building dependency tree       
Reading state information... Done
nvidia-352 is already the newest version.
The following packages were automatically installed and are no longer required:
  account-plugin-windows-live libupstart1
Use 'apt-get autoremove' to remove them.
0 upgraded, 0 newly installed, 0 to remove and 31 not upgraded.

我的 Ubuntu 有 NVIDIA 驱动程序 352 以及为什么我有类似

的错误
I0910 13:28:13.606891 10629 caffe.cpp:296] Use GPU with device ID 0
    modprobe: ERROR: could not insert 'nvidia_352': No such device
    F0910 13:28:13.728612 10629 common.cpp:142] Check failed: error == cudaSuccess (38 vs. 0)  no CUDA-capable device is detected

我检查了我是否有支持 CUDA 的设备,例如

lspci | grep -i nvidia
05:00.0 VGA compatible controller: NVIDIA Corporation GK107GL [Quadro K2000] (rev a1)
05:00.1 Audio device: NVIDIA Corporation GK107 HDMI Audio Controller (rev a1)

我有支持 CUDA 的设备,为什么会出现错误?

编辑 1: 是的,我对 ./deviceQuery 的测试失败了。

../NVIDIA_CUDA-7.5_Samples/bin/x86_64/linux/release/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 38
-> no CUDA-capable device is detected
Result = FAIL

我查看了 dev/ 文件夹,我有 nvidia0。

crwxrwxrwx  1 root root    195,   0 Sep 10 16:51 nvidia0
crw-rw-rw-  1 root root    195, 255 Sep 10 16:51 nvidiactl

我的 nvcc -V 检查给了我

li@li-HP-Z420-Workstation:/dev$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17

然后我的版本检查

li@li-HP-Z420-Workstation:/dev$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  352.41  Fri Aug 21 23:09:52 PDT 2015
GCC version:  gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04) 

有什么问题吗?

现在问题解决了。 我检查了 sudo dpkg --list | grep nvidia 我发现我的内核有 352.41,但客户端有 304.12。 所以我做了sudo apt-get remove --purge nvidia-*。它删除了所有包。 然后,将 352.41 安装为

$ sudo add-apt-repository ppa:xorg-edgers/ppa -y
$ sudo apt-get update
$ sudo apt-get install nvidia-352

之后

$ sudo dpkg --list | grep nvidia
rc nvidia-304 304.128-0ubuntu0~gpu14.04.2 amd64 NVIDIA legacy binary driver - version 304.128
rc nvidia-304-updates 304.125-0ubuntu0.0.2 amd64 NVIDIA legacy binary driver - version 304.125
ii nvidia-352 352.41-0ubuntu0~gpu14.04.1 amd64 NVIDIA binary driver - version 352.41
rc nvidia-opencl-icd-304 304.128-0ubuntu0~gpu14.04.2 amd64 NVIDIA OpenCL ICD
rc nvidia-opencl-icd-304-updates 304.125-0ubuntu0.0.2 amd64 NVIDIA OpenCL ICD
ii nvidia-opencl-icd-352 352.41-0ubuntu0~gpu14.04.1 amd64 NVIDIA OpenCL ICD
ii nvidia-prime 0.6.2 amd64 Tools to enable NVIDIA's Prime
ii nvidia-settings 355.11-0ubuntu0~gpu14.04.1 amd64 Tool for configuring the NVIDIA graphics driver

现在版本匹配。 然后 ./deviceQuery 和所有按预期工作。 谢谢

我也有这个问题。并且重新安装 nvidia 驱动程序并没有解决问题。

最后我用grub加了两个内核参数解决了这个问题

加入:

GRUB_CMDLINE_LINUX_DEFAULT

与:

pci=nocrs pci=realloc

我认为这是cuda7.5kernel3.19的碰撞。

我也遇到了这个问题。以上答案对我不起作用。当我安装最新的驱动程序(nvidia-364)时,它起作用了。命令 运行:

sudo add-apt-repository ppa:xorg-edgers/ppa 
sudo apt-get update 
sudo apt-get install nvidia-364

我认为当我们使用不同版本的 gcc 来编译驱动程序模块和 Linux 内核时会出现问题。

我可以做的另一种方法是使用 .运行 文件进行安装。 那需要先杀死 X 服务器。 X 服务器被杀死如下。

Make sure you are logged out.
Hit CTRL+ALT+F1 and login using your credentials.
kill your current X server session by typing sudo service lightdm stop or sudo stop lightdm
Enter runlevel 3 (or 5) by typing sudo init 3 (or sudo init 5) and install your .run file.
You might be required to reboot when the installation finishes. If not, run sudo service start lightdm or sudo start lightdm to start your X server again.

然后run .run file as sudo sh xxxxx.run

您可能会遇到 The distribution-provided pre-install script failed! Are you sure you want to continue? 的错误。然后中止安装并

disable the "Nouveau kernel driver" as sudo update-initramfs -u

然后重启系统,redo stop X server, enter runlevel 3 and do sudo sh xxxx.run again.

这次您可以忽略该消息并继续执行该失败消息。 然后您将能够从 .运行 文件安装 Nvidia 驱动程序。

如果您正在显示来自 non-nvidia 设备的视频但安装了 driver,您必须使用“--no-opengl-files”标志安装它,Gnome 才能工作。

我建议下载一个单独的 driver 并通过登录控制台手动安装它:

1. Alt Ctrl F2/f3/f4/f5 to get to console.
2. “init 3”  to kill UI
3. relogin if necessary to console
4. wget http://us.download.nvidia.com/tesla/418.67/NVIDIA-Linux-

driver x86_64-418.67.run

5. sh NVIDIA-Linux-x86_64-418.67.run --no-opengl-files
6. After installation - reboot