奇怪的cudnn PoolForward launch failed

今天写代码,连接服务器使用服务器的显卡时出现了奇怪的报错

InternalError (see above for traceback): cudnn PoolForward launch failed
	 [[Node: MaxPool = MaxPool[T=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 1, 20], padding="VALID", strides=[1, 1, 1, 2], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Relu)]]
	 [[Node: Neg/_5 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_244_Neg", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

离谱的是,当我通过

os.environ["CUDA_VISIBLE_DEVICES"] = "1" 

指定显卡1,也就是1080ti的时候并不会报错

而通过

os.environ["CUDA_VISIBLE_DEVICES"] = "0" 

指定显卡0,也就是2080ti的时候就会报错

但是通过nvidia-smi查看

1080ti才是显卡0,2080ti是显卡1

这就离谱了

在pycharm打印的报告里有

2020-09-30 12:49:41.147085: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10230 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:03:00.0, compute capability: 7.5)

Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10230 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:03:00.0, compute capability: 7.5)

按理说两张显卡的内存都是够用的才对,不应该出现内存不够的问题。。。

猜你喜欢

转载自blog.csdn.net/weixin_39518984/article/details/108881667