tensorflow配置GPU环境遇到的问题

下载

tensorflow2.0以上的版本都是CPU和GPU版本放一起了,如果是1.x版本还得分-gpu和-cpu。

解决 tensorflow-gpu 2.x出现错误 “Could not load dynamic library ‘cudart64_101.dll’

“cudart64_101.dll”是cuda10.1这个版本中才含有的,而我的机子上安装的cuda为10.2版本的。因此需要将“cudart64_101.dll”添加到“C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin”中去,问题解决。
cudart64_101.dll的下载地址:cudart64_101.dll

UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.

把10.1版本的dll放进去后,继续跑代码,发现出现这个问题,在导入包后,设置下GPU就可

physical_devices = tf.config.experimental.list_physical_devices('GPU')
assert len(physical_devices) > 0, "Not enough GPU hardware devices available"
tf.config.experimental.set_memory_growth(physical_devices[0], True)

ResourceExhaustedError: OOM when allocating tensor with shape[32,64,125,125] and type float on xxx

可以看到报错信息是OOM,即溢出了,因为GPU显存不够,因此需要修小每次训练的batch_size。

至此,这次Bug处理已结束,程序正常运行

猜你喜欢

转载自blog.csdn.net/Geek_/article/details/109284240