来自官方文档的Ubuntu 16.04 + tensorflow-GPU 配置

I  Preprare for CUDA installation

官方文档:http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html 这个官方文档是针对cuda 9.1.5的,而我们安装的是cuda 8.0,所以在安装cuda的语句中版本号会稍有不同,其它都是可放心参照的方法。

本节是一些准备工作,查看操作系统版本号、GPU型号等。

1.1 Verify You Have a CUDA-Capable GPU 查看本机是否有GPU

To verify that your GPU is CUDA-capable, go to your distribution's equivalent of System Properties, or, from the command line, enter:

$ lspci | grep -i nvidia

cuda 目前支持的GPU版本型号和大类包括:https://developer.nvidia.com/cuda-gpus

1.2 Verify You Have a Supported Version of Linux 查看Linux版本

The CUDA Development Tools are only supported on some specific distributions of Linux. These are listed in the CUDA Toolkit release notes. To determine which distribution and release number you're running, type the following at the command line:

$ uname -m && cat /etc/*release

1.3 Verify the System Has gcc Installed 确认gcc是否安装,并查看gcc版本号.

The gcc compiler is required for development using the CUDA Toolkit. gcc 是GNU编译器套装(英语:GNU Compiler Collection,缩写为GCC),指一套编程语言编译器. 编译器版本可处理多种语言:比如Java,Ada, C, C++等等. It is not required for running CUDA applications. It is generally installed as part of the Linux installation, and in most cases the version of gcc installed with a supported version of Linux will work correctly. To verify the version of gcc installed on your system, type the following on the command line:

$ gcc --version

1.4 Verify the System has the Correct Kernel Headers and Development Packages Installed 查看系统内核headers和development packages,与内核版本保持一致即可。

The CUDA Driver requires that the kernel headers and development packages for the running version of the kernel be installed at the time of the driver installation, as well whenever the driver is rebuilt. For example, if your system is running kernel version 3.17.4-301, the 3.17.4-301 kernel headers and development packages must also be installed.

While the Runfile installation performs no package validation, the RPM and Deb installations of the driver will make an attempt to install the kernel header and development packages if no version of these packages is currently installed. However, it will install the latest version of these packages, which may or may not match the version of the kernel your system is using. Therefore, it is best to manually ensure the correct version of the kernel headers and development packages are installed prior to installing the CUDA Drivers, as well as whenever you change the kernel version.

The version of the kernel your system is running can be found by running the following command:

手动查看kernel版本

$ uname -r

The kernel headers and development packages for the currently running kernel can be installed with:

安装与系统kernel版本对应的headers 和development packages.

$ sudo apt-get install linux-headers-$(uname -r)

II. Download CUDA toolkit 8.0 and Installation

(注意:目前tensorflow 1.3 只支持CUDA toolkit 8.0+cudnn 6.0 )

建议读者在安装时,请check 实时的tensorflow官网上支持的CUDA 版本 以及cudnn版本,否则装了最新版本,不被tensorflow支持,还得卸载重新来过。

tensorflow 官网: https://www.tensorflow.org/install/install_linux?hl=zh-cn#prepare_your_environment 支持的版本信息如下,更高版本不行:

  • CUDA® Toolkit 8.0. For details, see NVIDIA's documentation. Ensure that you append the relevant Cuda pathnames to the LD_LIBRARY_PATH environment variable as described in the NVIDIA documentation.
  • The NVIDIA drivers associated with CUDA Toolkit 8.0.
  • cuDNN v6.0. For details, see NVIDIA's documentation. Ensure that you create the CUDA_HOMEenvironment variable as described in the NVIDIA documentation.

2.1 Download  cuda toolkit 下载cuda toolkit,注意下载cuda 8.0

https://developer.nvidia.com/cuda-80-ga2-download-archive   

选择 Linux> x86_64> ubuntu> 16.04> deb(local)

2.2 install cuda toolkit 8.0 安装

在terminal 窗口依次输入以下Installation Instructions

cd命令进入到下载文件的文件夹,然后输入以下命令,安装cuda
  1. `$ sudo dpkg -i cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb`
  2. `$ sudo apt-get update`
  3. `$ sudo apt-get install cuda`

********如果上述命令为你安装的不是cude-8-0而是新版cuda-9-0等,解决方案如下**********

因为我之前安装过高版本的cuda-9.1,发现tensorflow不支持,因此卸载并请清除过cuda-9.1。用上面三句话重新安装cuda最后还是会自动安装cuda-9.0而不是我希望的cuda-8。

参考解决方案网址:https://devtalk.nvidia.com/default/topic/1024342/cuda-setup-and-installation/unable-to-uninstall-cuda-9-0-completely-and-install-8-0-instead/

归纳如下:

先卸载已经安装的高版本的cuda9.1

$ sudo apt-get --purge remove cuda

$ sudo apt autoremove

然后清理apt-cache

$ sudo apt-get clean

最后重新安装,并且cuda的指定版本号

$ sudo dpkg -i cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb

$ sudo apt-get update

$ sudo apt-get install cuda-8-0

顺利完成!

******************************************

2.3 environment setup 配置环境变量

打开\home目录下的.bashrc 文件(这是隐藏文件,因此需要先用ctrl+H 快捷键显示隐藏文件再打开),在.bashrc的最后追加如下语句:

export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}

export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64/${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

# 注意这里要路径要和Nvida驱动版本一致  在终端输入 $cat /proc/driver/nvidia/version 可以查看驱动版本号

export LPATH=/usr/lib/nvidia-387:$LPATH

export LIBRARY_PATH=/usr/lib/nvidia-387:$LIBRARY_PATH

注意:上述语句中除了export后面的空格,不要有不必要的空格,否则会不识别,是空格敏感的

2.4 Test cuda是否安装成功, 查看nvcc编译器的版本

$ nvcc -V

 

III. install cudnn  (深度神经网络库 Deep Neural Network library) 

官方文档:http://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html

3.1 download cudnn (注意下载cudnn 6.0)

读者别嫌麻烦,注册加入(join)一下,然后就可以免费下载,下载时注意选择与本机ubuntu版本,cuda版本号对应的cudnn 6.0

https://developer.nvidia.com/rdp/form/cudnn-download-survey

3.2 install cudnn 

  • Navigate to your <cudnnpath> directory containing cuDNN Debian file. cd命令进入到下载这三个文件的目录,然后依次安装
$ sudo dpkg -i libcudnn6_6.0.3.11-1+cuda8.0_amd64.deb
  • Install the developer library, for example:
$ sudo dpkg -i libcudnn6-dev_6.0.3.11-1+cuda8.0_amd64.deb
  • Install the code samples and the cuDNN Library User Guide, for example:
$ sudo dpkg -i libcudnn6-doc_6.0.3.11-1+cuda8.0_amd64.deb

这里的sudo dpkg -i 后面的 ‘ libcudnn6-...’  版本号 以自己下载文件的命名为准。

小结:cuDNN is just installed by dropping files onto your system, 不用配置环境变量.

 

IV. install Tensorflow-gpu

参考官网文档: https://www.tensorflow.org/install/install_linux?hl=zh-cn#prepare_your_environment

4.1 prepare

The libcupti-dev library, which is the NVIDIA CUDA Profile Tools Interface. This library provides advanced profiling support. To install this library, issue the following command:

sudo apt-get install libcupti-dev

4.2  用native pip命令安装 tensorflow-gup

sudo apt-get install python3-pip python3-dev # for Python 3.n

pip3 install tensorflow-gpu # Python 3.n; GPU support 

(Optional.) If above step ‘$ pip3 install tensor flow-gpu’ failed, install the latest version of TensorFlow by issuing a command of the following format:

sudo pip3 install --upgrade tfBinaryURL   # Python 3.n 

where tfBinaryURL identifies the URL of the TensorFlow Python package. The appropriate value oftfBinaryURL depends on the operating system, Python version, and GPU support. Find the appropriate value for tfBinaryURL here. For example, to install TensorFlow for Linux, Python 3.4, and CPU-only support, issue the following command:

sudo pip3 install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.4.0-cp34-cp34m-linux_x86_64.whl

4.3 类似2.3节提到的环境变量配置,在.bashrc文档中再追加环境变量

# Tensorflow 要求的环境变量

export CUDA_HOME=/usr/local/cuda-8.0

4.4. Test tensorflow-gpu 是否配置成功, 跑一段代码

$ python3

# 进入Python 环境下

>>> import tensorflow as tf

>>> hello =tf.constant("hello, tensorflow")

>>> sess = tf.Session()
>>> print(sess.run(hello))

输出了"hello, tensorflow" ,运行成功,恭喜你。

 

附录:遇到过的错误及解决方案

1. 我一切都安装好了,但是运行时报错,cannot load nativeruntime tensorflow: 

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/__init__.py", line 23, in <module>
    from tensorflow.python import *
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/__init__.py", line 49, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 28, in <module>
    _pywrap_tensorflow = swig_import_helper()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)

ImportError: libcudart.so.8.0: cannot open shared object file: No such file or directory

错误原因:I installed Cuda 9.0, but I realized that tensorflow 1.3 does not yet support it.

方法:

# I did following steps to remove cuda 9.0

$ sudo apt-get --purge remove cuda

$ sudo apt autoremove

# Then clear apt-cache

$ sudo apt-get clean

# Then I tried following steps to reinstall the cuda 8.0

$ sudo dpkg -i cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb

$ sudo apt-get update

$ sudo apt-get install cuda

再次遇到问题: I have tried uninstalling cuda v9.0 but when I try to uninstall v8.0, v9.0 keeps getting installed instead. However cuda 9.0 keeps getting installed instead. How do I prevent this from happening and install 8.0?

Nvidia ansuwer: 再卸载一遍,安装时上述三句话的最后一句指定cuda版本号

$ sudo apt-get install cuda-8-0

其他参考:

https://segmentfault.com/a/1190000008234390

猜你喜欢

转载自blog.csdn.net/passball/article/details/82817779