debian安装驱动,cuda

  • 本文只为记录,不是教程
  • 要安装的版本:
    驱动版本:440.64
    cuda版本:10.0.130
    cudnn版本:7.4.2
    tf:2.0.0
    tf-gpu:1.14.0
  • 历史版本

cuda历史版本下载地址

cudnn历史版本下载地址

电脑是GTX 950M,官网匹配的cuda版本是10.2,但是想要搭建tf深度学习框架现在支持的cuda只到10.0,而且直接安装10.2是正常的,但是安装10.0版本的,遇到了Installation Failed. Using unsupported Compiler.问题,导致电脑重装3次。。
教训就是:linux没有那么脆弱,别动不动就重装!!!

安装驱动

  • 禁用原本驱动
sudo vim /etc/modprobe.d/blacklist.conf
# 添加:
blacklist nouveau
options nouveau modeset=0
  • sudo update-initramfs -u
  • 重启
  • 验证
lsmod | grep nouveau
  • 安装nvidia驱动
  • 源(个人经验:要注释掉原来一行,并解开以deb开头的一行):
# src
deb http://mirrors.aliyun.com/debian/ buster main non-free contrib
deb-src http://mirrors.aliyun.com/debian/ buster main non-free contrib
deb http://mirrors.aliyun.com/debian-security buster/updates main
deb-src http://mirrors.aliyun.com/debian-security buster/updates main
deb http://mirrors.aliyun.com/debian/ buster-updates main non-free contrib
deb-src http://mirrors.aliyun.com/debian/ buster-updates main non-free contrib
deb http://mirrors.aliyun.com/debian/ buster-backports main non-free contrib
deb-src http://mirrors.aliyun.com/debian/ buster-backports main non-free contrib

# sudo apt-get update
# sudo apt-get upgrade
# sudo apt install aptitude


# 命令:
sudo ./NVIDIA***.run --no-opengl-files --no-x-check --no-nouveau-check
  • 验证
# 1、
sudo apt-get install mesa-utils
# 2、
glxinfo | grep rendering  
# 3、
$ nvidia-smi
Sun Mar 22 11:23:06 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64       Driver Version: 440.64       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 950M    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   40C    P0    N/A /  N/A |      0MiB /  4046MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
  • 或许有用系列:
    https://my.oschina.net/haopeng/blog/415999

  • 卸载驱动

sudo /usr/bin/nvidia-uninstall

安装cuda

  • 命令:
sudo sh cuda_***.run
  • Toolkit: Installed in /usr/local/cuda-10.0

出现的问题

  • 第一次安装时出现下列问题:
Error: cannot find Toolkit in /usr/local/cuda-10.0

===========
= Summary =
===========

Driver:   Not Selected
Toolkit:  Installation Failed. Using unsupported Compiler.
Samples:  Cannot find Toolkit in /usr/local/cuda-10.0
  • 以为是系统被我玩坏了,之后重装了n次还是这个结果。之前一直百度Cannot find Toolkit in /usr/local/cuda-10.0,突然看到上面一行单词compiler,有点像computer,翻译后是编译程序,之后一搜Installation Failed. Using unsupported Compiler.,发现时gcc版本不适应导致,并有解决办法(命令后面加上-override):
# 亲测可用
sudo ./cuda_***.run -override
  • 应该是成功了的
===========
= Summary =
===========

Driver:   Not Selected
Toolkit:  Installed in /usr/local/cuda-10.0
Samples:  Installed in /home/xieyipeng, but missing recommended libraries

Please make sure that
 -   PATH includes /usr/local/cuda-10.0/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-10.0/lib64, or, add /usr/local/cuda-10.0/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-10.0/bin

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-10.0/doc/pdf for detailed information on setting up CUDA.

***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 384.00 is required for CUDA 10.0 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
    sudo <CudaInstaller>.run -silent -driver

Logfile is /tmp/cuda_install_7279.log
  • 但是后面的warning让我惊了(百度翻译):
警告:安装不完整!此安装未安装CUDA驱动程序。CUDA 10.0功能需要至少384.00版本的驱动程序才能工作。
要使用此安装程序安装驱动程序,请运行以下命令,并将<CudaInstaller>替换为此运行文件的名称:
  • 但是我上面nvidia-smi明明显示驱动版本440啊。。。或许只是个warning,又不是error。。。

  • 也可能是已有驱动,只是安装cuda时选择了不安装driver所以有这个warning吧。。

  • 配置环境变量(vim ~/.bashrc

export CUDA_HOME=/usr/local/cuda-10.0
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64
export PATH=${CUDA_HOME}/bin:${PATH}
  • 生效source ~/.bashrc
  • 检查nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
    cd /usr/local/cuda/samples/1_Utilities/deviceQuery #由自己电脑目录决定
    make
    sudo ./deviceQuery
  • 卸载cuda
sudo /usr/local/cuda-10.0/bin/uninstall_cuda_10.0.pl

cudnn

  • 安装验证
xieyipeng@debian:~/下载/last/cuda$ sudo cp include/cudnn.h /usr/local/cuda-10.0/include/
[sudo] xieyipeng 的密码:
xieyipeng@debian:~/下载/last/cuda$ sudo cp lib64/libcudnn* /usr/local/cuda-10.0/lib64/
xieyipeng@debian:~/下载/last/cuda$ sudo chmod a+r /usr/local/cuda-10.0/include/cudnn.h 
xieyipeng@debian:~/下载/last/cuda$ sudo chmod a+r /usr/local/cuda-10.0/lib64/libcudnn*
xieyipeng@debian:~/下载/last/cuda$ cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 4
#define CUDNN_PATCHLEVEL 2
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

#include "driver_types.h"
xieyipeng@debian:~/下载/last/cuda$ 

猜你喜欢

转载自blog.csdn.net/xieyipeng1998/article/details/105022736