Ubuntu16.04安装caffe+CUDA8.0+cuDNN GPU version

博客已经迁移到 https://leerw.github.io,欢迎来访,共同交流进步
写在前面

最开始是在win10下面开了个virtual box的Ubuntu虚拟机,而且为了追求好看装得还是Ubuntu17.04 Gnome,因为刚出来不久,一些bug很多,踩了很多坑,后来就上了Ubuntu16.04LTS了,但是因为在虚拟机下不能用GPU版本,所以真的是慢到要死啊!好吧我也就忍了,毕竟虚拟机。但是前天因为在win10下matlab2017a打开工作没几分钟,直接占用我CPU达到97.5%!我也是醉了,想着重启会好些吧,没想到直接把我的win10给搞崩了,估计是后台进程还没有推出不正常关机造成的。所以干脆,一不做二不休,上双系统,但也意味着得重新配置了。讲实话,要不是还要在windows下做一些工作,我是真的不想用windows了,麻烦。废话不多说,我们开始caffe的安装。

配置:
win10 + Ubuntu16.04 双系统
CPU: i5-4300H
GPU: NVIDIA GTX950M

官方教程:
[caffe installation guideline](http://caffe.berkeleyvision.org/install_apt.html)
[caffe installation guideline(dependencies installation)](http://caffe.berkeleyvision.org/install_apt.html)

一、安装NVIDIA显卡驱动
我的Ubuntu是新安装的,所以上来先更新一下

sudo apt update

Ubuntu16.04 LTS是默认使用Nouveau作为我们的GPU驱动的,所以我们要先装上NVIDIA的驱动。打开软件和更新
安装NVIDIA驱动
选择使用NVIDAI驱动,应用更改等待完成之后重启即可

二、安装CUDA
CUDA下载
我选的是 linux-x86_64_Ubuntu16.04-runfile(local)-Base Installer
下载完成之后执行

sudo sh cuda_8.0.27_linux.run

执行之后会让你选择是否安装显卡驱动:

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 375.26?

选择否(因为我们刚才已经安装了更高版本)之后,之后的一切选择默认即可。

Logging to /tmp/cuda_install_27233.log
Using more to view the EULA.
End User License Agreement
--------------------------
Lisence
...
...
--------------------------
Do you accept the previously read EULA?
accept/decline/quit:accept
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 375.26?
(y)es/(n)o/(q)uit: n
# cause I have installed version 375.66
Install the CUDA 8.0 Toolkit?
(y)es/(n)o/(q)uit: y
Enter Toolkit Location
 [ default is /usr/local/cuda-8.0 ]:
# I chose the defalut to avoid unexpected errors
Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y
Install the CUDA 8.0 Samples?
(y)es/(n)o/(q)uit: y
Enter CUDA Samples Location
 [ default is /home/leerw ]:
# default too

有人在此过程中遇到unsupport complier的错误,是因为g++编译器的版本太高问题导致的,我没有遇到这个错误,如果你遇到了,参见xuzhongxiong的博客

安装完成之后,我们配置一下环境变量

sudo vi /etc/profile

加上两句:

export PATH=/usr/local/cuda-8.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:$LD_LIBRARY_PATH

立即生效

sudo ldconfig

在这里我们可能会得到一个error:

/sbin/ldconfig.real: /usr/lib/nvidia-375/libEGL.so.1 不是符号连接

/sbin/ldconfig.real: /usr/lib32/nvidia-375/libEGL.so.1 不是符号连接

So, how to fix this error? This maybe caused by a conflit between different version of libEGL.lib.

sudo mv /usr/lib/nvidia-375/libEGL.so.1 /usr/lib/nvidia-375/libEGL.so.1.org
sudo mv /usr/lib32/nvidia-375/libEGL.so.1 /usr/lib32/nvidia-375/libEGL.so.1.org
sudo ln -s /usr/lib/nvidia-375/libEGL.so.375.39 /usr/lib/nvidia-375/libEGL.so.1
sudo ln -s /usr/lib32/nvidia-375/libEGL.so.375.39 /usr/lib32/nvidia-375/libEGL.so.1

this issue is reported at this

接下来我们测试一下:
在这之前,we would like to install some library we will use next step

sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev  libglu1-mesa libglu1-mesa-dev libgl1-mesa-glx

OK, test is starting

cd /usr/local/cuda/samples
ls
0_Simple     2_Graphics  4_Finance      6_Advanced       common    Makefile
1_Utilities  3_Imaging   5_Simulations  7_CUDALibraries  EULA.txt
sudo make -j8
make[1]: Entering directory '/usr/local/cuda-8.0/samples/0_Simple/matrixMul_nvrtc'
make[1]: Entering directory '/usr/local/cuda-8.0/samples/0_Simple/simpleZeroCopy'
make[1]: Entering directory '/usr/local/cuda-8.0/samples/0_Simple/simpleMultiGPU'
make[1]: Entering directory '/usr/local/cuda-8.0/samples/0_Simple/simplePitchLinearTexture'
...
...
make[1]: Leaving directory '/usr/local/cuda-8.0/samples/3_Imaging/dxtc'
cd ./bin/x86_64/linux/release/
./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 950M"
  CUDA Driver Version / Runtime Version          8.0 / 8.0
  CUDA Capability Major/Minor version number:    5.0
  Total amount of global memory:                 2003 MBytes (2100232192 bytes)
  ( 5) Multiprocessors, (128) CUDA Cores/MP:     640 CUDA Cores
  GPU Max Clock rate:                            1124 MHz (1.12 GHz)
  Memory Clock rate:                             1001 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 2097152 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 950M
Result = PASS
// if you see this, congratulations! 

三、安装cuDNN
官网下载

    I selected this version "https://developer.nvidia.com/compute/machine-learning/cudnn/secure/v7/#prod/8.0_20170802/cudnn-8.0-linux-x64-v7-tgz" cause my cuda's version is 8.0 
    unzip the tar.gz, we got this foler "CUDA"
    copy some files to our local foler
// in the "cuda" foler
sudo cp lib64/lib* /usr/local/cuda/lib64/
sudo cp include/cudnn.h /usr/local/cuda/include/
cd /usr/local/cuda/lib64/
sudo chmod +r libcudnn.so.7.0.1
# 创建软链接
sudo ln -sf libcudnn.so.7.0.1 libcudnn.so.7
sudo ln -sf libcudnn.so.7 libcudnn.so
sudo ldconfig

至此我们的cuDNN已经安装完成

三、安装OpenCV3

we would like to get opencv3 code from git to get the latest version, here is the link

git clone https://github.com/opencv/opencv
正克隆到 'opencv'...
remote: Counting objects: 210094, done.
remote: Compressing objects: 100% (5/5), done.
remote: Total 210094 (delta 0), reused 1 (delta 0), pack-reused 210089
接收对象中: 100% (210094/210094), 429.85 MiB | 245.00 KiB/s, 完成.
处理 delta 中: 100% (145370/145370), 完成.
检查连接... 完成。
正在检出文件: 100% (5371/5371), 完成.

接下来我们开始build

// in the "opencv" folder
mkdir build
cd build/
// 前提是你已经下载安装了cmake,如果没有,可以很方便地在Ubuntu软件中心中找到安装即可
cmake -D CMAKE_BUILD_TYPE=Release -D CMAKE_INSTALL_PREFIX=/usr/local ..
可能会有错误:
...compling...
-- Checking for module 'libavresample'
--   No package 'libavresample' found
-- Checking for module 'libgphoto2'
--   No package 'libgphoto2' found
-- IPPICV: Download: ippicv_2017u2_lnx_intel64_20170418.tgz
-- 
=======================================================================
  Couldn't download files from the Internet.
  Please check the Internet access on this host.
=======================================================================

CMake Warning at cmake/OpenCVDownload.cmake:188 (message):
  IPPICV: Download failed: 6;"Couldn't resolve host name"

  For details please refer to the download log file:

  /media/leerw/办公/ubuntu_caffe/opencv/build/CMakeDownloadLog.txt
# 我们到log文件中看一下
################################显示如下################################
use_cache "/media/leerw/办公/ubuntu_caffe/opencv/.cache"
do_unpack "ippicv_2017u2_lnx_intel64_20170418.tgz" "87cbdeb627415d8e4bc811156289fa3a" "https://raw.githubusercontent.com/opencv/opencv_3rdparty/a62e20676a60ee0ad6581e217fe7e4bada3b95db/ippicv/ippicv_2017u2_lnx_intel64_20170418.tgz" "/media/leerw/办公/ubuntu_caffe/opencv/build/3rdparty/ippicv"
#check_md5 "/media/leerw/办公/ubuntu_caffe/opencv/.cache/ippicv/87cbdeb627415d8e4bc811156289fa3a-ippicv_2017u2_lnx_intel64_20170418.tgz"
#mismatch_md5 "/media/leerw/办公/ubuntu_caffe/opencv/.cache/ippicv/87cbdeb627415d8e4bc811156289fa3a-ippicv_2017u2_lnx_intel64_20170418.tgz" "d41d8cd98f00b204e9800998ecf8427e"
#delete "/media/leerw/办公/ubuntu_caffe/opencv/.cache/ippicv/87cbdeb627415d8e4bc811156289fa3a-ippicv_2017u2_lnx_intel64_20170418.tgz"
#cmake_download "/media/leerw/办公/ubuntu_caffe/opencv/.cache/ippicv/87cbdeb627415d8e4bc811156289fa3a-ippicv_2017u2_lnx_intel64_20170418.tgz" "https://raw.githubusercontent.com/opencv/opencv_3rdparty/a62e20676a60ee0ad6581e217fe7e4bada3b95db/ippicv/ippicv_2017u2_lnx_intel64_20170418.tgz"
#######################################################################

是因为没有下载成功,没关系我们手动下载,感谢这个网页, great help
按照log中的说法我们把这个压缩包复制到/opencv/.cache/目录下

cp ippicv_2017u2_lnx_intel64_20170418.tgz opencv/.cache/ 
// in the opencv/build/ folder
sudo make install

好了,我们的opencv3已经编译好了,我们在python下测试一下
我的python版本是自带的python2.7

>>> import cv2
 Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
 ImportError: No module named cv2

这是因为我们没有安装python-opencv包

pip install python-opencv
 Collecting opencv-python
  Downloading opencv_python-3.3.0.9-cp27-cp27mu-manylinux1_x86_64.whl (8.8MB)
    100% |████████████████████████████████| 8.8MB 68kB/s 
 Collecting numpy>=1.11.1 (from opencv-python)
  Downloading numpy-1.13.1-cp27-cp27mu-manylinux1_x86_64.whl (16.6MB)
    100% |████████████████████████████████| 16.6MB 50kB/s 
 Installing collected packages: numpy, opencv-python
 Successfully installed numpy opencv-python
 Success and test again

if you do not have pip, execute as follow first to get pip

sudo apt install pip

重新测试一下

>>> import cv2
>>> print("opencv installation is succeed!")
opencv installation is succeed!

四、安装caffe

先从git上下载

git clone https://github.com/BVLC/caffe

安装boost

sudo apt-get install -y --no-install-recommends libboost-all-dev

安装BLAS

sudo apt-get install libatlas-base-dev

安装其他必须的库

sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libboost-all-dev libhdf5-serial-dev \
libgflags-dev libgoogle-glog-dev liblmdb-dev protobuf-compiler
sudo pip install scikit-image protobuf

得到一个错误:

Collecting scikit-image
  Downloading scikit_image-0.13.0-cp27-cp27mu-manylinux1_x86_64.whl (33.7MB)
    85% |███████████████████████████▍    | 28.9MB 32kB/s eta 0:02:32Exception:
Traceback (most recent call last):
  File "/home/leerw/.local/lib/python2.7/site-packages/pip/basecommand.py", line 215, in main
    status = self.run(options, args)
  File "/home/leerw/.local/lib/python2.7/site-packages/pip/commands/install.py", line 324, in run
    requirement_set.prepare_files(finder)
  File "/home/leerw/.local/lib/python2.7/site-packages/pip/req/req_set.py", line 380, in prepare_files
    ignore_dependencies=self.ignore_dependencies))
  File "/home/leerw/.local/lib/python2.7/site-packages/pip/req/req_set.py", line 620, in _prepare_file
    session=self.session, hashes=hashes)
  File "/home/leerw/.local/lib/python2.7/site-packages/pip/download.py", line 821, in unpack_url
    hashes=hashes
  File "/home/leerw/.local/lib/python2.7/site-packages/pip/download.py", line 659, in unpack_http_url
    hashes)
  File "/home/leerw/.local/lib/python2.7/site-packages/pip/download.py", line 882, in _download_http_url
    _download_url(resp, link, content_file, hashes)
  File "/home/leerw/.local/lib/python2.7/site-packages/pip/download.py", line 603, in _download_url
    hashes.check_against_chunks(downloaded_chunks)
  File "/home/leerw/.local/lib/python2.7/site-packages/pip/utils/hashes.py", line 46, in check_against_chunks
    for chunk in chunks:
  File "/home/leerw/.local/lib/python2.7/site-packages/pip/download.py", line 571, in written_chunks
    for chunk in chunks:
  File "/home/leerw/.local/lib/python2.7/site-packages/pip/utils/ui.py", line 139, in iter
    for x in it:
  File "/home/leerw/.local/lib/python2.7/site-packages/pip/download.py", line 560, in resp_read
    decode_content=False):
  File "/home/leerw/.local/lib/python2.7/site-packages/pip/_vendor/requests/packages/urllib3/response.py", line 357, in stream
    data = self.read(amt=amt, decode_content=decode_content)
  File "/home/leerw/.local/lib/python2.7/site-packages/pip/_vendor/requests/packages/urllib3/response.py", line 324, in read
    flush_decoder = True
  File "/usr/lib/python2.7/contextlib.py", line 35, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/leerw/.local/lib/python2.7/site-packages/pip/_vendor/requests/packages/urllib3/response.py", line 246, in _error_catcher
    raise ReadTimeoutError(self._pool, None, 'Read timed out.')
ReadTimeoutError: HTTPSConnectionPool(host='pypi.python.org', port=443): Read timed out.

没关系,网络问题,我重试就好啦
then we go to caffe/python and install all the necessary Python packages

for req in $(cat requirements.txt); do sudo pip install $req; done

接下来我们要修改我们caffe目录下的Makefile和Makefile.config
首先是Makefile.config,先把Makefile.config.example中的内容复制到Makefile.config

// in the "caffe" folder
cp Makefile.config.example Makefile.config
# USE_CUDNN := 1
注释去掉修改为
USE_CUDNN := 1

# OPENCV_VERSION := 3
注释去掉修改为
OPENCV_VERSION := 3

PYTHON_INCLUDE := /usr/include/python2.7 \
        /usr/lib/python2.7/dist-packages/numpy/core/include
修改为
PYTHON_INCLUDE := /usr/include/python2.7 \
        /usr/local/lib/python2.7/dist-packages/numpy/core/include

# WITH_PYTHON_LAYER := 1
去掉注释修改为
WITH_PYTHON_LAYER := 1

INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include
修改为:
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial/

接下来是Makefile

修改为
LIBRARIES += glog gflags protobuf boost_system boost_filesystem m hdf5_serial_hl hdf5_serial

接下来make

make all -j8
make test -j8
make runtest -j8
############################ if you see this, congratulations! #################################
[       OK ] ProtoTest.TestSerialization (0 ms)
[----------] 1 test from ProtoTest (0 ms total)

[----------] 2 tests from GemmTest/1, where TypeParam = double
[ RUN      ] GemmTest/1.TestGemvCPUGPU
[       OK ] GemmTest/1.TestGemvCPUGPU (1 ms)
[ RUN      ] GemmTest/1.TestGemmCPUGPU
[       OK ] GemmTest/1.TestGemmCPUGPU (0 ms)
[----------] 2 tests from GemmTest/1 (1 ms total)

[----------] 3 tests from TanHLayerTest/1, where TypeParam = caffe::CPUDevice<double>
[ RUN      ] TanHLayerTest/1.TestTanH
[       OK ] TanHLayerTest/1.TestTanH (0 ms)
[ RUN      ] TanHLayerTest/1.TestTanHOverflow
[       OK ] TanHLayerTest/1.TestTanHOverflow (0 ms)
[ RUN      ] TanHLayerTest/1.TestTanHGradient
[       OK ] TanHLayerTest/1.TestTanHGradient (2 ms)
[----------] 3 tests from TanHLayerTest/1 (2 ms total)

[----------] 3 tests from ThresholdLayerTest/1, where TypeParam = caffe::CPUDevice<double>
[ RUN      ] ThresholdLayerTest/1.TestSetup
[       OK ] ThresholdLayerTest/1.TestSetup (0 ms)
[ RUN      ] ThresholdLayerTest/1.Test
[       OK ] ThresholdLayerTest/1.Test (0 ms)
[ RUN      ] ThresholdLayerTest/1.Test2
[       OK ] ThresholdLayerTest/1.Test2 (0 ms)
[----------] 3 tests from ThresholdLayerTest/1 (0 ms total)

[----------] Global test environment tear-down
[==========] 2101 tests from 277 test cases ran. (381562 ms total)
[  PASSED  ] 2101 tests.
################################# our caffe installation is over################################

最后我们配置一下pycaffe

sudo vim ~/.bashrc

在文件末尾追加

export PYTHONPATH=/media/leerw/办公/caffe/python:$PYTHONPATH
/*这里的/media/leerw/办公/caffe/python请改为你的路径,注意路径中不能包含中文,否则你会遇到我在文末提到的痛苦*/

生效

source ~/.bashrc
make clean
make pycaffe -j8

如果你遇到了如下错误

CXX/LD -o python/caffe/_caffe.so python/caffe/_caffe.cpp
python/caffe/_caffe.cpp:10:31: fatal error: numpy/arrayobject.h: No such file or directory
compilation terminated.

是因为numpy没有安装

pip install numpy
// 如果提示没有权限
sudo pip install numpy

注意caffe路径中不能含有中文,否则在python中一直会提示无法import caffe 这个错误!(说起来都是泪啊!!!)

至此我们的caffe GPU version就已经安装完毕了,祝你玩的快乐!如果对你有帮助,欢迎转载,转载请注明作者和我的博客地址,谢谢!

猜你喜欢

转载自blog.csdn.net/lrwwll/article/details/77531733