【故障诊断】CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace b

【故障诊断】CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace b

故障描述

CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

图1

原因解释

这个故障是一个不太常见的故障,因为他不是编译错误,而是底层错误,本质上是因为一些下标超过限制造成的。

解决方案

经过核查,我的问题出现在:BCEloss和CrossEntropyloss可能会因为自变量的值不在log函数的定义域里而出现溢出,所以要换成BCEWithLogits。

猜你喜欢

转载自blog.csdn.net/Michael_Cretu_/article/details/125363236