GpuArrayException: No cuda device available尝试解决

问题：

在import keras或import ttheano时出现了以下：

>>> import keras
Using Theano backend.
ERROR (theano.gpuarray): Could not initialize pygpu, support disabled
Traceback (most recent call last):
  File "/data_d/old_home/home/.conda/envs/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 227, in <module>
    use(config.device)
  File "/data_d/old_home/home/.conda/envs/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 214, in use
    init_dev(device, preallocate=preallocate)
  File "/data_d/old_home/home/.conda/envs/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 99, in init_dev
    **args)
  File "pygpu/gpuarray.pyx", line 658, in pygpu.gpuarray.init
  File "pygpu/gpuarray.pyx", line 587, in pygpu.gpuarray.pygpu_init
GpuArrayException: No cuda device available

搜索到的解决办法很少，简直奔溃。

尝试了pip uninstall theano并且使用conda install theano安装后，出现了更为奇怪的问题，搜索之后发现是由于theano1.0.4和numpy16.0出现不兼容等问题，所以进行了卸载。

重新使用pip install theano之后，进行操作，仍旧是同样的错误：

>>> import theano
ERROR (theano.gpuarray): Could not initialize pygpu, support disabled
Traceback (most recent call last):
  File "/data_d/old_home/home/.conda/envs/ib/python2.7/site-packages/theano/gpuarray/__init__.py", line 227, in <module>
    use(config.device)
  File "/data_d/old_home/home/.conda/envs/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 214, in use
    init_dev(device, preallocate=preallocate)
  File "/data_d/old_home/home/.conda/envs/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 99, in init_dev
    **args)
  File "pygpu/gpuarray.pyx", line 658, in pygpu.gpuarray.init
  File "pygpu/gpuarray.pyx", line 587, in pygpu.gpuarray.pygpu_init
GpuArrayException: No cuda device available

其他配置如下：

[global]
floatX = float32
device =cuda
[cuda]
root=/usr/local/cuda-8.0

##.theanorc文件

echo $PATH
/data_d/old_home/home/.conda/envs/bin:/usr/local/cuda-8.0/bin:/data_d/public/miniconda2/bin:/usr/local/cuda-9.0/bin:/usr/local/sbin:
/usr/local/bin:/usr/sbin:/usr/bin:/s:/usr/local/cuda-8.0/bin/local/games:/snap/bin:/usr/local/cuda-8.0/bin

CUDA_VISIBLE_DEVICES=1
CUDA_HOME=/usr/local/cuda-8.0
PATH="$PATH:/usr/local/cuda-8.0/bin"
LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda-8.0/lib64:/usr/local/cuda-8.0/extras/CUPTI/lib64"

#.bashrc文件

at /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR      6
#define CUDNN_MINOR      0
#define CUDNN_PATCHLEVEL 21

所使用的theano版本为1.0.4，对应的pygpu为0.7.6。

又怀疑是否是cuda-8.0文件夹的所有者被改变？一开始安装好应该是我，但是之后变成了root，将所有者重新变为我之后，发现仍旧不行，所以这里的方法是卸载并重新安装cuda。

跑测试程序也是同样的报错：

Using Theano backend.
ERROR (theano.gpuarray): Could not initialize pygpu, support disabled
Traceback (most recent call last):
  File "/data_d/old_home/home/.conda/envs/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 227, in <module>
    use(config.device)
  File "/data_d/old_home/home/.conda/envs/xhs/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 214, in use
    init_dev(device, preallocate=preallocate)
  File "/data_d/old_home/home/.conda/envs/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 99, in init_dev
    **args)
  File "pygpu/gpuarray.pyx", line 658, in pygpu.gpuarray.init
  File "pygpu/gpuarray.pyx", line 587, in pygpu.gpuarray.pygpu_init
GpuArrayException: No cuda device available
Training -----------
('train cost: ', array(4.1908903, dtype=float32))
('train cost: ', array(0.10415509, dtype=float32))
('train cost: ', array(0.01151281, dtype=float32))
('train cost: ', array(0.00458441, dtype=float32))

Testing ------------
40/40 [==============================] - 0s 5us/step
('test cost:', 0.005374030210077763)
('Weights=', array([[0.56634265]], dtype=float32), '\nbiases=', array([2.001063], dtype=float32))

//所以说为什么cuda检测不到呢？

尝试一：

修改配置文件，改为了cuda0，结果import theano时：

[global]
floatX = float32
device =cuda0
[cuda]
root=/usr/local/cuda-8.0

>>> import theano
ERROR (theano.gpuarray): Could not initialize pygpu, support disabled
Traceback (most recent call last):
  File "/data_d/old_home/home/.conda/env/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 227, in <module>
    use(config.device)
  File "/data_d/old_home/home/.conda/envs/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 214, in use
    init_dev(device, preallocate=preallocate)
  File "/data_d/old_home/home/.conda/envs/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 99, in init_dev
    **args)
  File "pygpu/gpuarray.pyx", line 658, in pygpu.gpuarray.init
  File "pygpu/gpuarray.pyx", line 587, in pygpu.gpuarray.pygpu_init
GpuArrayException: GPU is too old for CUDA version

这个问题先放一下，在https://blog.csdn.net/qq_33200967/article/details/80689543看到，需要检查cuda是否安装成功，由于直接用make报错，https://devtalk.nvidia.com/default/topic/1048902/cuda-setup-and-installation/cuda-samples-ubuntu-make-file-errors/，

所以使用了sudo make -k，发现输出结果为：

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVS 315"
  CUDA Driver Version / Runtime Version          9.0 / 8.0
  CUDA Capability Major/Minor version number:    2.1
  Total amount of global memory:                 963 MBytes (1010040832 bytes)
  ( 1) Multiprocessors, ( 48) CUDA Cores/MP:     48 CUDA Cores
  GPU Max Clock rate:                            1046 MHz (1.05 GHz)
  Memory Clock rate:                             875 Mhz
  Memory Bus Width:                              64-bit
  L2 Cache Size:                                 65536 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65535), 3D=(2048, 2048, 2048)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (65535, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 2 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = NVS 315
Result = PASS

查看nvidia显卡驱动版本：https://blog.csdn.net/s_sunnyy/article/details/64121826

cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  384.130  Wed Mar 21 03:37:26 PDT 2018
GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.10)

查看本机nvidia显卡：

:/dev$ ls -l nvidia*
crw-rw-rw- 1 root root 195,   0 5月  17 12:53 nvidia0
crw-rw-rw- 1 root root 195,   1 5月  17 12:53 nvidia1
crw-rw-rw- 1 root root 195, 255 5月  17 12:53 nvidiactl
crw-rw-rw- 1 root root 195, 254 5月  17 12:53 nvidia-modeset
crw-rw-rw- 1 root root 240,   0 5月  17 12:53 nvidia-uvm

查看cudnn的版本：， conda list -n username

cudatoolkit               10.0.130                      0  
cudnn                     7.3.1                cuda10.0_0

似乎版本过高，https://blog.csdn.net/li57681522/article/details/82491617

安装的cudatoolkit和cudnn程序包版本是：10.0

but实际上，我根本就没有安装过cuda10.0。

所以尝试卸载

conda uninstall cudnn
Fetching package metadata ...........
Solving package specifications: .

Package plan for package removal in environment /data_d/old_home/home/xhs/.conda/envs:

The following packages will be REMOVED:

    cudnn: 7.3.1-cuda10.0_0

Proceed ([y]/n)? y

conda uninstall cudatoolkit
Fetching package metadata ...........
Solving package specifications: .

Package plan for package removal in environment /data_d/old_home/home/xhs/.conda/envs:

The following packages will be REMOVED:

    cudatoolkit: 10.0.130-0
    cupti:       10.0.130-0

Proceed ([y]/n)? y

使用：

conda install cudatoolkit=8.0
Fetching package metadata ...........
Solving package specifications: .

Package plan for installation in environment /data_d/old_home/home/xhs/.conda/envs:

The following NEW packages will be INSTALLED:

    cudatoolkit: 8.0-3

Proceed ([y]/n)? y

conda install cudnn=6.0
Fetching package metadata ...........
Solving package specifications: .

Package plan for installation in environment /data_d/old_home/home/xhs/.conda/env:

The following NEW packages will be INSTALLED:

    cudnn: 6.0.21-cuda8.0_0

Proceed ([y]/n)? y

cudatoolkit               8.0                           3  
cudnn                     6.0.21                cuda8.0_0

查询结果如上。

结果依旧同样的错误。

GpuArrayException: No cuda device available

尝试改为cuda9.0?

GpuArrayException: No cuda device available尝试解决

猜你喜欢