- 本文只为记录,不是教程
- 要安装的版本:
驱动版本:440.64
cuda版本:10.0.130
cudnn版本:7.4.2
tf:2.0.0
tf-gpu:1.14.0 - 历史版本
电脑是GTX 950M,官网匹配的cuda版本是10.2,但是想要搭建tf深度学习框架现在支持的cuda只到10.0,而且直接安装10.2是正常的,但是安装10.0版本的,遇到了
Installation Failed. Using unsupported Compiler.
问题,导致电脑重装3次。。
教训就是:linux没有那么脆弱,别动不动就重装!!!
安装驱动
- 禁用原本驱动
sudo vim /etc/modprobe.d/blacklist.conf
# 添加:
blacklist nouveau
options nouveau modeset=0
sudo update-initramfs -u
- 重启
- 验证
lsmod | grep nouveau
- 安装nvidia驱动
- 源(个人经验:要注释掉原来一行,并解开以deb开头的一行):
# src
deb http://mirrors.aliyun.com/debian/ buster main non-free contrib
deb-src http://mirrors.aliyun.com/debian/ buster main non-free contrib
deb http://mirrors.aliyun.com/debian-security buster/updates main
deb-src http://mirrors.aliyun.com/debian-security buster/updates main
deb http://mirrors.aliyun.com/debian/ buster-updates main non-free contrib
deb-src http://mirrors.aliyun.com/debian/ buster-updates main non-free contrib
deb http://mirrors.aliyun.com/debian/ buster-backports main non-free contrib
deb-src http://mirrors.aliyun.com/debian/ buster-backports main non-free contrib
# sudo apt-get update
# sudo apt-get upgrade
# sudo apt install aptitude
# 命令:
sudo ./NVIDIA***.run --no-opengl-files --no-x-check --no-nouveau-check
- 验证
# 1、
sudo apt-get install mesa-utils
# 2、
glxinfo | grep rendering
# 3、
$ nvidia-smi
Sun Mar 22 11:23:06 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64 Driver Version: 440.64 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 950M Off | 00000000:01:00.0 Off | N/A |
| N/A 40C P0 N/A / N/A | 0MiB / 4046MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
-
或许有用系列:
https://my.oschina.net/haopeng/blog/415999 -
卸载驱动
sudo /usr/bin/nvidia-uninstall
安装cuda
- 命令:
sudo sh cuda_***.run
Toolkit: Installed in /usr/local/cuda-10.0
出现的问题
- 第一次安装时出现下列问题:
Error: cannot find Toolkit in /usr/local/cuda-10.0
===========
= Summary =
===========
Driver: Not Selected
Toolkit: Installation Failed. Using unsupported Compiler.
Samples: Cannot find Toolkit in /usr/local/cuda-10.0
- 以为是系统被我玩坏了,之后重装了n次还是这个结果。之前一直百度
Cannot find Toolkit in /usr/local/cuda-10.0
,突然看到上面一行单词compiler
,有点像computer
,翻译后是编译程序
,之后一搜Installation Failed. Using unsupported Compiler.
,发现时gcc版本不适应导致,并有解决办法(命令后面加上-override
):
# 亲测可用
sudo ./cuda_***.run -override
- 应该是成功了的
===========
= Summary =
===========
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-10.0
Samples: Installed in /home/xieyipeng, but missing recommended libraries
Please make sure that
- PATH includes /usr/local/cuda-10.0/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-10.0/lib64, or, add /usr/local/cuda-10.0/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-10.0/bin
Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-10.0/doc/pdf for detailed information on setting up CUDA.
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 384.00 is required for CUDA 10.0 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run -silent -driver
Logfile is /tmp/cuda_install_7279.log
- 但是后面的warning让我惊了(百度翻译):
警告:安装不完整!此安装未安装CUDA驱动程序。CUDA 10.0功能需要至少384.00版本的驱动程序才能工作。
要使用此安装程序安装驱动程序,请运行以下命令,并将<CudaInstaller>替换为此运行文件的名称:
-
但是我上面
nvidia-smi
明明显示驱动版本440啊。。。或许只是个warning,又不是error。。。 -
也可能是已有驱动,只是安装cuda时选择了不安装driver所以有这个warning吧。。
-
配置环境变量(
vim ~/.bashrc
)
export CUDA_HOME=/usr/local/cuda-10.0
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64
export PATH=${CUDA_HOME}/bin:${PATH}
- 生效
source ~/.bashrc
- 检查
nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
cd /usr/local/cuda/samples/1_Utilities/deviceQuery #由自己电脑目录决定
make
sudo ./deviceQuery
- 卸载cuda
sudo /usr/local/cuda-10.0/bin/uninstall_cuda_10.0.pl
cudnn
- 安装验证
xieyipeng@debian:~/下载/last/cuda$ sudo cp include/cudnn.h /usr/local/cuda-10.0/include/
[sudo] xieyipeng 的密码:
xieyipeng@debian:~/下载/last/cuda$ sudo cp lib64/libcudnn* /usr/local/cuda-10.0/lib64/
xieyipeng@debian:~/下载/last/cuda$ sudo chmod a+r /usr/local/cuda-10.0/include/cudnn.h
xieyipeng@debian:~/下载/last/cuda$ sudo chmod a+r /usr/local/cuda-10.0/lib64/libcudnn*
xieyipeng@debian:~/下载/last/cuda$ cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 4
#define CUDNN_PATCHLEVEL 2
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
#include "driver_types.h"
xieyipeng@debian:~/下载/last/cuda$