python/pytorch 个人coding报错/异常

1.调用函数返回`‘None’type`
2.使用matplotlib.pyplot 保存图片为空
3.`TypeError: Image data of dtype object cannot be converted to float`
4.`RuntimeError: stack expects each tensor to be equal size, but got [2, 5] at entry 0 and [1, 5] at entry 1`
5.`torch.gather（） RuntimeError: Size does not match at dimension 0 get 2 vs 1`
6.`TypeError: list indices must be integers or slices, not tuple`
7.发生异常: `ModuleNotFoundError No module named '***'`
- 7.1 其中一个解决方法
- 7.2 绝对路径--最简单直接且有效的解决方法
8. `ValueError: could not convert string to float: 'r'`
9. `RuntimeError: CUDA error: an illegal memory access was encountered`
10. `UnpicklingError: A load persistent id instruction was encountered,but no persistent_load function was specified.`
- 10.1 Error触发情景：在google colab中训练好的.pth文件，下载到本地cpu进行加载，出现该错误。
- 10.2 解决办法
- - 10.2.1 方法一
  - 10.2.2 方法二
11.`TypeError: new() received an invalid combination of arguments - got (numpy.float64, int), but expected one of: * (*, torch.device device)`
12.`EmptyDataError: No columns to parse from file`
13.`AttributeError: predict_proba is not available when probability=False`
14.`(null): can't open file '***': [Errno 2] No such file or directory`
15.`OSError: [WinError 123]` 文件名、目录名或卷标语法不正确。
16.`TypeError: **func（） got an unexpected keyword argument ***`
17.`RuntimeError: pad should be smaller than half of kernel size, but got padW = 1, padH = 1, kW = 1, kH = 1`
18.`RuntimeError: **.pt is a zip archive(did you mean to use torch.jit.load()?)`
19. `'pip' 不是内部或外部命令，也不是可运行的程序或批处理文件`
20. `ipykernel_launcher.py: error: unrecognized arguments: -f /root/.local/share/jupyter/runtime/kernel-347b9f63-bf6e-4d91-8b6f-9a9b9b10e20a.json An exception has occurred, use %tb to see the full traceback.`
21. `raise TypeError("Invalid dimensions for image data") TypeError: Invalid dimensions for image data`
22. `RuntimeError: Expected object of scalar type Float but got scalar type Double for argument #4 'mat1'`
23.`RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [16, 64, 160, 160]], which is output 0 of ReluBackward1, is at version 2; expected version 1 instead. Hint: enable anomal`①
24.`TypeError: forward() takes 1 positional argument but 2 were given`
- 24.1 在forward中使用for循环
- 24.2 使用nn.Sequential(*list)替换nn.ModuleList(list）
25.`ValueError: num_samples should be a positive integer value, but got num_samples=0`
26.`BrokenPipeError: [Errno 32] Broken pipe`
27.`RuntimeError: one of the variables needed for gradient computation has been modified by an ==inplace operation==: [torch.FloatTensor [10, 6]], which is output 0 of NativeBatchNormBackward, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).`②
28.构造子函数/类过程中的两个SyntaxError
- 28.1 `SyntaxError: positional argument follows keyword argument`
- 28.2 `SyntaxError: non-default argument follows default argument`
29.`FileNotFoundError: [Errno 2] No such file or directory:`
30.`ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()`
31. `FileNotFoundError: [WinError 3]` 系统找不到指定的路径。
32.字符串中含有双引号/单引号出差 `SyntaxError: invalid syntax`
33.`OSError: [Errno 22] Invalid argument: 'E:\x01.pkl'`
- 33.1 单斜杠都写成双斜杠
- 33.2 字符串前加禁止转义符r
34.`RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.`
35.`TypeError: 'Tensor' object is not callable`
36.`AttributeError: 'builtin_function_or_method' object has no attribute 'fftn'`
37.`TypeError: __main__.conv_relu is not a Module subclass'`
37. pandas 读取数据进行运算时报错：`TypeError: unsupported operand type(s) for +: 'int' and 'str'`
38.`AttributeError: 'builtin_function_or_method' object has no attribute 'fftn'`
38.`ValueError: a must be 1-dimensional`
39.`ImportError: cannot import name 'evaluate' from 'test' (c:\python37\lib\test\__init__.py)`
40.`ValueError: dictionary update sequence element #0 has length 1; 2 is required`
41.编译过程中的错误
- 41.1 ==warnings.warn('Error checking compiler version for {}: {}'.format(compiler, error))==
42.使用nn.DataParallel训练后加载权重出错
- 42.1 修改保存权重代码，重新生成权重文件
- 42.2 使用代码去除权重文件中的"module."关键字，加载/保存权重文件
43.python/numpy读取pkl文件报错
44.`ValueError: Usecols do not match columns, columns expected but not found:`
50.环境配置
- 50.1 ==Error checking compiler version for cl==
51. PyTorch Not Implemented Error
参考文件

1.调用函数返回`‘None’type`

def g(x,n):
    if (x%2!=0):
        print(n)
        return n
    else:
        g(x/2,n+1)
print(g(8,0))

结果：

3
None

为什么在子函数中打印没有，而返回的时候就不一样了呢？
进行如下修改：

def g(x,n):
    if (x%2!=0):
        print(n)
        return n
    else:
        return g(x/2,n+1)
print(g(8,0))

结果为：

3
3

这样就可以了。原来是由于递归调用的时候，递归的地方记得要加上return，这样就不出再返回none type 。

2.使用matplotlib.pyplot 保存图片为空

很有可能是由于plt.savefig()放在了plt.show()之后

plt.imshow(image_roi,cmap="gray")
plt.show()
plt.savefig('image_roi.png')

换下顺序，问题应该就迎刃而解了。

plt.imshow(image_roi,cmap="gray")
plt.savefig('image_roi.png')
plt.show()

如下。
在这里插入图片描述
另外，plt.imshow(img,cmap=“gray”)，注意“cmp”，如果没有它，plt显示出来不是灰度图，而是泛绿。

3.`TypeError: Image data of dtype object cannot be converted to float`

其中一种情形，先使用了CV2对图片进行操作，使用plt.imshow()时会出上述类型错误。

plt.imshow(img.get())
plt.show()

这里的img是cv2.UMat类型。
img.get()将其变成numpy类型。
这样操作后就会正常。

4.`RuntimeError: stack expects each tensor to be equal size, but got [2, 5] at entry 0 and [1, 5] at entry 1`

这里介绍pytorch DataLoader触发的一个错误。
当一张图片对应多个标签时，
例如，在SSD中，img的size为[3,300,300]
若annotation中只有一个标签，则target size为[1,5]，若为多个，则为[2,5]。

data_loader = DataLoader(dataset,batch_size=2, num_workers=1,shuffle=True,pin_memory=True)

完整的错误提示如下：

/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/collate.py in default_collate(batch)
     53             storage = elem.storage()._new_shared(numel)
     54             out = elem.new(storage)
---> 55         return torch.stack(batch, 0, out=out)
     56     elif elem_type.__module__ == 'numpy' and elem_type.__name__ != 'str_' \
     57             and elem_type.__name__ != 'string_':

RuntimeError: stack expects each tensor to be equal size, but got [4, 5] at entry 0 and [2, 5] at entry 1

这是因为当batch_size>1或者 num_workers>0时，会默认有一个torch.stack(）操作，而又因为size不一致就出发了上述错误。
其中一个解决方案是，自定义一个collate_fn函数：

def detection_collate(batch):
    """Custom collate fn for dealing with batches of images that have a different
    number of associated object annotations (bounding boxes).
    Arguments:
        batch: (tuple) A tuple of tensor images and lists of annotations
    Return:
        A tuple containing:
            1) (tensor) batch of images stacked on their 0 dim
            2) (list of tensors) annotations for a given image are stacked on 0 dim
    """
    targets = []
    imgs = []
    for sample in batch:
        imgs.append(sample[0])
        targets.append(torch.FloatTensor(sample[1]))
    return torch.stack(imgs, 0), targets

这样使用列表的append就处理了target size不一致的问题。

5.`torch.gather（） RuntimeError: Size does not match at dimension 0 get 2 vs 1`

如下例：

t = torch.tensor([[1,2],[3,4]])
index=torch.tensor([[0]])
torch.gather(t, 1, index)

对于一个二维tensor t，若除了变换维度（此例dim=1）,其它维度上index size与该tensor t不一致，就会产生维度不match的错误。
改成下面这样就可以了。t的size 22，index size 21，因此对于dim=1进行gather操作，在dim=0，二者维度都是2。

t = torch.tensor([[1,2],[3,4]])
index=torch.tensor([[0],[1]])
torch.gather(t, 1, index)

6.`TypeError: list indices must be integers or slices, not tuple`

list在处理多维数据时，使用类似于a[0,:]会产生上述错误。

a=[[0,1,2],[1,2,3]]
a[0][:]
#[0, 1, 2]
a[0,:]
#TypeError: list indices must be integers or slices, not tuple
#将list转成np.array
import numpy as np
a=np.array(a)
a[0,:]
#array([0, 1, 2])

7.发生异常: `ModuleNotFoundError No module named '***'`

一般为自定义函数跨文件夹调用出错。

在这里插入图片描述
如图，一个工作区PRJ中包含了两个文件夹，分别为a和b。文件夹a中有a1.py，文件夹b中有b1.py。
a1.py定义了一个打印函数：

def printf():
    print("From a1.py")

若要在b1.py中调用a1.py中的printf( )，仅用以下语句是不行的，这样就会报找不到module错误。

from a import a1

7.1 其中一个解决方法

import os 
sys.path.append(os.getcwd())

os.getcwd() 方法用于返回当前工作目录。
即将当前工作目录添加到sys.path。
b1.py完整代码如下：

import sys
import os 
sys.path.append(os.getcwd())
#print(sys.path)
from a import a1

if __name__ == "__main__":
    a1.printf()

这样就可以正常调用了。

#From a1.py

7.2 绝对路径–最简单直接且有效的解决方法

例：E盘中有文件夹1和2，文件夹1中有hello、welcome两个.py文件，文件夹2中有greeting.py文件，那么如何在greeting中使用hello、welcome中定义的函数/模块呢？
在这里插入图片描述

其实很简单：

import sys
sys.path.append("E://1")

from welcome import *
from hello import *

通过将需要导入模块对应的文件地址加入sys.path，即可，再也不用担心出现module找不到的问题了。需要注意可以使用双斜杠或双反斜杠来避免一些错误。
E://1或E:\\1

8. `ValueError: could not convert string to float: 'r'`

如下：

a={
    
    0:'q',1:'w',2:'e',3:'r'}
# b=np.zeros(5,dtype=np.str)
b=np.zeros(5)
c=[3,2,0,1,2]
for i in range(len(c)):
        b[i]=a[c[i]]

结果：

会产生ValueError: could not convert string to float: 'r’的错误。

原因是b初始化的数组元素类型为numpy.float64。
而字典键值对的值却是str类型的。
可通过其他初始化方法：
如：

b=np.zeros(5,dtype=np.str)
或者：
b=np.array(['']*5)

这样就可以了。

9. `RuntimeError: CUDA error: an illegal memory access was encountered`

这是在google colab中训练出现的错误。
原因不是很确定，但是错误解决了。
疑似原因：打开并使用GPU运行了colab中其他程序，这样对当前运行程序而言，产生了cuda应用程序异步执行，参考https://www.jianshu.com/p/d1e7a7480539。
提供如下两种可解决的办法：

设置CUDA_LAUNCH_BLOCKING=1，禁用所有cuda应用程序异步执行;
关闭其他使用gpu的程序。

10. `UnpicklingError: A load persistent id instruction was encountered,but no persistent_load function was specified.`

10.1 Error触发情景：在google colab中训练好的.pth文件，下载到本地cpu进行加载，出现该错误。

加载时使用torch.load(*,map_location='cpu')，这个应该没问题。
错误提示也看的不是很明白。

10.2 解决办法

可能有以下两种解决方法：

10.2.1 方法一

colab中使用torch.load('*.pth',map_location='cpu')，加载显示该文件内容，如下：
在这里插入图片描述

复制并粘贴到本地编辑器，如spyder中；
把复制那部分赋值给一个变量，如weight；
最后就可以正常保存，加载，在cpu中进行训练、推断了。

import torch
from  collections import  OrderedDict
import torch.tensor as tensor
weight=...
torch.save(weight, '1.pth')
torch.load('./1.pth',map_location=torch.device('cpu'))

在这里插入图片描述

10.2.2 方法二

有可能是torch版本兼容问题(见18），尝试修改：

torch.save(models.state_dict(),"*.pth",_use_new_zipfile_serialization=False)

11.`TypeError: new() received an invalid combination of arguments - got (numpy.float64, int), but expected one of: * (*, torch.device device)`

  torch.nn.Linear(input_dim,hidden_layer1),

很有可能是神经网络某一层，如nn.Linear(m, n)的层数不是int类型，仔细看下很快就可以锁定错误。

12.`EmptyDataError: No columns to parse from file`

很有可能是读取的文件损坏了。
在这里插入图片描述

13.`AttributeError: predict_proba is not available when probability=False`

这是因为sklearn.svm.SVC初始化时如果没有设置probability参数，则其默认值为False。
而当使用sklearn.ensemble.VotingClassifier，voting='soft’时，要计算estimators给出的 predicted probabilities。这样就会报错。

If ‘hard’, uses predicted class labels for majority rule voting. Else if ‘soft’, predicts the class label based on the argmax of the sums of the predicted probabilities,

将svm.SVC的probability参数设置为True，就正常了。

svm.SVC(kernel='rbf', gamma=0.7, C=1.0,probability=True)

14.`(null): can't open file '***': [Errno 2] No such file or directory`

在cmd中cd到python文件目录下，运行python文件报错。
我出现的这种情况是只输入了：

python ***.py

而该文件运行需要一种参数设置：

python ***.py ??? ###

输全之后就不报错了。

15.`OSError: [WinError 123]` 文件名、目录名或卷标语法不正确。

有可能是文件目录格式无法被python识别，保险起见，目录之间的分隔符都是用“\”或者“//”。

16.`TypeError: func（） got an unexpected keyword argument *`

可能是如下情况：

 f1=lambda p1, p2: p1 + p2
 params={
    
    'p0': 1, 'p2': 3}
 f1(**params)

自定义函数的参数名要和**传递的字典key保持一致。

f1=lambda p1, p2: p1 + p2
params={
    
    'p1': 1, 'p2': 3}
f1(**params)

使用def 自定义函数也是一样的。

def f1(p1,p2):return p1+p2
params={
    
    'p1': 1, 'p2': 3}
f1(**params)

17.`RuntimeError: pad should be smaller than half of kernel size, but got padW = 1, padH = 1, kW = 1, kH = 1`

File "C:\Python37\lib\site-packages\torch\nn\functional.py", line 487, in _max_pool2d
    input, kernel_size, stride, padding, dilation, ceil_mode)

有可能是nn.MaxPool2d()参数设置问题，如：

nn.MaxPool2d(1, 2, 1, ceil_mode=True)

在这里插入图片描述

18.`RuntimeError: **.pt is a zip archive(did you mean to use torch.jit.load()?)`

有可能是torch版本兼容问题，尝试修改：

torch.save(models.state_dict(),"*.pth",_use_new_zipfile_serialization=False)

此bug参考了https://blog.csdn.net/weixin_44769214/article/details/108188126。

19. `'pip' 不是内部或外部命令，也不是可运行的程序或批处理文件`

如果已经添加了环境变量C:\Python37\Scripts\，还是不好使。
有可能是pip被莫名地删除了，这个时候可以来一个硬核重装（亲测）：
打开cmd，输入命令python -m ensurepip --default-pip，然后就可以在python安装目录Script看到pip.exe了。

20. `ipykernel_launcher.py: error: unrecognized arguments: -f /root/.local/share/jupyter/runtime/kernel-347b9f63-bf6e-4d91-8b6f-9a9b9b10e20a.json An exception has occurred, use %tb to see the full traceback.`

本人是在使用google colab argparse功能时出的错。
参考了https://blog.csdn.net/lancecrazy/article/details/92430514:
将args = parser.parse_args()修改为：

args = parser.parse_known_args()[0]

Bug fixed!
参考博客提到的是jupyter的报错。
看来应该是这种notebook型编辑器的通病。

21. `raise TypeError("Invalid dimensions for image data") TypeError: Invalid dimensions for image data`

使用plt.imshow(img)时，img的类型为 $m * n * 3$ （以RGB三个通道为例）。
因此如果数据格式不是这样的，则需要进行相关转换，否则会报错。
如当前img的shape为torch.Size([1, 3, 256, 256])，则需要进行变换的代码如下：

import matplotlib.pyplot as plt
img = img.squeeze().permute(1, 2, 0).numpy()#256*256*3
plt.imshow(img)
plt.show()

22. `RuntimeError: Expected object of scalar type Float but got scalar type Double for argument #4 'mat1'`

一般对tensor进行如下数据类型定义即可解决：

x =  torch.tensor(x,dtype=torch.float32)
#或
x=x.dtype(torch.float32)#若x已经是tensor类型

具体报错原因可参考https://blog.csdn.net/weixin_44675362/article/details/108697715

23.`RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [16, 64, 160, 160]], which is output 0 of ReluBackward1, is at version 2; expected version 1 instead. Hint: enable anomal`①

将ReLU的参数inplace=True，改成 False。具体可参考https://zhuanlan.zhihu.com/p/38475183。

24.`TypeError: forward() takes 1 positional argument but 2 were given`

报错内容：

  File "c:\python37\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)

TypeError: forward() takes 1 positional argument but 2 were given

可能的原因，使用nn.modulist，而modulist中的各子module是没有forward操作的，所以报错，详见参考文献[7].
如：

import torch
import torch.nn as nn

class net(nn.Module):
    def __init__(self):
        super(net, self).__init__()
        self.modlist = nn.ModuleList([
                       nn.Conv2d(1, 20, 5),
                       nn.ReLU(),
                        nn.Conv2d(20, 64, 5),
                        nn.ReLU()
                        ])
    def forward(self, x):
        x=self.modlist(x)
        return x

x = torch.randn(16, 1, 20, 20)
net = net()
print(net(x))

就会报相同的错误。
在这里插入图片描述
解决方法：

24.1 在forward中使用for循环

import torch
import torch.nn as nn

class net(nn.Module):
    def __init__(self):
        super(net, self).__init__()
        self.modlist = nn.ModuleList([
                       nn.Conv2d(1, 20, 5),
                       nn.ReLU(),
                        nn.Conv2d(20, 64, 5),
                        nn.ReLU()
                        ])
    def forward(self, x):
        for m in self.modlist:
            x = m(x)
        return x

x = torch.randn(16, 1, 20, 20)
net = net()
print(net(x))

24.2 使用nn.Sequential(*list)替换nn.ModuleList(list）

import torch
import torch.nn as nn

class net(nn.Module):
    def __init__(self):
        super(net, self).__init__()
        self.modlist = [
                       nn.Conv2d(1, 20, 5),
                       nn.ReLU(),
                        nn.Conv2d(20, 64, 5),
                        nn.ReLU()
                        ]
        self.modlist=nn.Sequential(*self.modlist)
    def forward(self, x):
        x = self.modlist(x)
        return x

x = torch.randn(16, 1, 20, 20)
net = net()
print(net(x))

当然这里只是举个栗子，实际的list中的各子模块可能是自定义模块、组合模块等。
好的，这个bug也被破了，但是归根结底还是对pytorch底层的一些东西缺乏了解。
通过这个例子，也对模型构造有了更深刻的认识。

25.`ValueError: num_samples should be a positive integer value, but got num_samples=0`

可参考https://github.com/thstkdgus35/EDSR-PyTorch/issues/185。
一般是由于数据集路径问题，导致读入的图片个数为0。可仔细检查下文件路径。
在这里插入图片描述

26.`BrokenPipeError: [Errno 32] Broken pipe`

  File "c:\python37\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)

可以看下是否主代码中是否有：

if __name__=='__main__':

若没有，添加下。

27.`RuntimeError: one of the variables needed for gradient computation has been modified by an ==inplace operation==: [torch.FloatTensor [10, 6]], which is output 0 of NativeBatchNormBackward, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).`②

这个估计很难尽言，可能也是个人能力有限。

原因是存在某个inplace操作，这里可能不是指relu这些的inplace。

如下例，就触发了以上错误：

import torch
import torch.nn as nn
class ANN_3layers(nn.Module):
    def __init__(self, input_dim, hidden_layer,output_layer):
        super().__init__()
        self.input_dim = input_dim
        self.hidden_layer = hidden_layer
        self.output_layer = output_layer  
        
        self.nn1=nn.Linear(self.input_dim,self.hidden_layer)
        self.bn1=nn.BatchNorm1d(self.hidden_layer)
        
        self.nn_att=nn.Linear(self.hidden_layer,1)
        self.sigmoid=nn.Sigmoid()   
        
        self.nn2=nn.Linear(self.hidden_layer,self.output_layer)        

    def forward(self,x):
        x=self.nn1(x)
        x=self.bn1(x)
        
        multi_factor=self.nn_att(x)
        multi_factor=self.sigmoid(multi_factor)
        multi_factor=multi_factor.expand(-1,6)  

        x[x<0]=multi_factor[x<0]*x[x<0]
        x=self.nn2(x)
        return x
        
if __name__=='__main__':
    x=torch.rand((10,8))
    y=torch.ones((10,3))
    loss_fn = torch.nn.BCEWithLogitsLoss()
    m=ANN_3layers(8,6,3)
    out=m(x)
    loss=loss_fn(out,y)
    loss.backward()

修改如下：

import torch
import torch.nn as nn
class ANN_3layers(nn.Module):
    def __init__(self, input_dim, hidden_layer,output_layer):
        super().__init__()
        self.input_dim = input_dim
        self.hidden_layer = hidden_layer
        self.output_layer = output_layer  
        
        self.nn1=nn.Linear(self.input_dim,self.hidden_layer)
        self.bn1=nn.BatchNorm1d(self.hidden_layer)
        
        self.nn_att=nn.Linear(self.hidden_layer,1)
        self.sigmoid=nn.Sigmoid()   
        
        self.nn2=nn.Linear(self.hidden_layer,self.output_layer)        

    def forward(self,x):
        x=self.nn1(x)
        x=self.bn1(x)
        
        multi_factor=self.nn_att(x)
        multi_factor=self.sigmoid(multi_factor)
        multi_factor=multi_factor.expand(-1,6)
        
        y=x.clone()
        mask=(y<0)
        x[mask]=multi_factor[mask]*x[mask]       
        
        x=self.nn2(y)
        return x
        

if __name__=='__main__':
    x=torch.rand((10,8))
    y=torch.ones((10,3))
    loss_fn = torch.nn.BCEWithLogitsLoss()
    m=ANN_3layers(8,6,3)
    out=m(x)
    loss=loss_fn(out,y)
    loss.backward()

通过对比，主要是
将：

x[x<0]=multi_factor[x<0]*x[x<0]

修改成了：

y=x.clone()
mask=(y<0)
x[mask]=multi_factor[mask]*x[mask]

看来这句就是出发inplace操作的罪魁祸首了。
更多信息可以参考：
https://discuss.pytorch.org/t/encounter-the-runtimeerror-one-of-the-variables-needed-for-gradient-computation-has-been-modified-by-an-inplace-operation/836/7
大神解释见图。不过我目前还是发现不了哪些算inplace操作，只能对疑似语句进行逐一排雷了。
在这里插入图片描述

28.构造子函数/类过程中的两个SyntaxError

在构造子函数/类的过程中，有时会因为参数赋值的问题触发以下两个错误。

这里简单总结下。

28.1 `SyntaxError: positional argument follows keyword argument`

def m(x,y):
    return x+y
m(x=3,1)

这个错误是要求把keyword argument放到变量声明的后面。如下：

def m(x,y):
    return x+y
m(3,y=1)

28.2 `SyntaxError: non-default argument follows default argument`

如果在构造子函数或者类过程中，参数中把默认参数(default argument)放到了前面，就会报这样的错误。

def m(x=1,y):
    return x+y
m(3,1)

需要把默认参数放到后面：

def m(y,x=1):
    return x+y
m(3,1)

一个比较粗暴的方法就是在调用时把变量名称都添加上，这样就不会因为位置问题出错了。

def m(x,y=2):
    print(x,y)
    return x+y
m(y=3,x=5)

29.`FileNotFoundError: [Errno 2] No such file or directory:`

在使用cmd，执行python程序时产生上述错误，一种可能的原因：
代码中的文件地址使用了相对地址，如".//"。
但cmd执行程序时没有通过cd跳到python程序所在文件夹，导致找不到该文件，产生错误。
如：
在这里插入图片描述

30.`ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()`

如果一个变量初始时为None，假设后来被赋值成了np.array类型，这是若判断它是否还为None的时候，若采用了例如 if a或者if not a之类就会报上述错误。

但是如果a被修改成了list类型，则不会有该错误。因此这应该是numpy专有的判断数据机制。

a=None
a=np.random.randint(5, size=(2,2, 2))
if not a:
    pass

为了消除该错误，提供的一种修改方法：

a=None
a=np.random.randint(5, size=(2,2, 2))
if  type(a)!= 'NoneType':
#或者
#if a is not None:
    pass

31. `FileNotFoundError: [WinError 3]` 系统找不到指定的路径。

若在创建文件夹时，报上述错误，有可能是使用了os.mkdir创建了多级文件夹。
在这里插入图片描述
创建多级文件夹，需要使用os.makedirs：

import os 
os.makedirs("./3/2")

另外，os.makedirs创建一级文件夹也是不会报错的。

32.字符串中含有双引号/单引号出差 `SyntaxError: invalid syntax`

如：

a="?E=FFD<7>DFFFDF@F?F4F?37F5?:6<F?3?E?2'."94<61'?"014)$/,F@E&DF1BF."

出错：

  File "<ipython-input-1-c62cf7282215>", line 1
    a="?E=FFD<7>DFFFDF@F?F4F?37F5?:6<F?3?E?2'."94<61'?"014)$/,F@E&DF1BF."
                                                ^
SyntaxError: invalid syntax

修改方案，使用三引号'''或者"""。
此例：

 a='''?E=FFD<7>DFFFDF@F?F4F?37F5?:6<F?3?E?2'."94<61'?"014)$/,F@E&DF1BF.'''

33.`OSError: [Errno 22] Invalid argument: 'E:\x01.pkl'`

提供两种可能的解决方案。

33.1 单斜杠都写成双斜杠

pd.read_pickle('E:\\1.pkl')

33.2 字符串前加禁止转义符r

pd.read_pickle(r'E:\\1.pkl')

34.`RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.`

如下例子改编自pytorch中retain_graph参数的作用

import torch
from torch.autograd import Variable
x = torch.randn((1,4),dtype=torch.float32,requires_grad=True)
y = x ** 2
z = (x+y) * 4
output1 = y.mean()
output2 = z.sum()
output1.backward()   # 这里参数表明保留backward后的中间参数。
output2.backward()

这时就会产生上述错误。

part of the autograd graph is shared between threads, i.e. run first part of forward single thread, then run second part in multiple threads, then the first part of graph is shared. In this case different threads execute grad() or backward() on the same graph might have issue of destroying the graph on the fly of one thread, and the other thread will crash in this case. Autograd will error out to the user similar to what call backward() twice with out retain_graph=True,and let the user know they should use retain_graph=True.

原因大致可以解释为：如果有2个backward()，前一个backward()结束后会释放掉计算图，所以如果后一个backward()计算中还需要用到这个计算图，就会出错。

出现这个问题，修改的话自然会想到，让前一个backward()计算结束后仍保留这些计算图：
即output1.backward(retain_graph=True)。

import torch
from torch.autograd import Variable
x = torch.randn((1,4),dtype=torch.float32,requires_grad=True)
y = x ** 2
z = (x+y) * 4
output1 = y.mean()
output2 = z.sum()
output1.backward(retain_graph=True)   # 这里参数表明保留backward后的中间参数。
output2.backward()

35.`TypeError: 'Tensor' object is not callable`

一种出错的情况，如下：

import torch
import torch.nn as nn

x_t=torch.randn((2,2))
y_t=torch.empty((2,2)).random_(2)

loss = nn.BCEWithLogitsLoss()
loss=loss(x_t,y_t)

x_v=torch.randn((2,2))
y_v=torch.empty((2,2)).random_(2)
loss=loss(x_v,y_v)

loss先被定义为了损失函数，又通过损失函数的值进行了赋值，现在它已经是不是损失函数，而只是一个tensor。下次再次使用loss(x_v,y_v)当然就出错了。

这是一个比较silly的错误。

36.`AttributeError: 'builtin_function_or_method' object has no attribute 'fftn'`

加入这一句import：

import torch.fft

37.`TypeError: main.conv_relu is not a Module subclass'`

import torch
import torch.nn as nn

class conv_relu(nn.Module):
    def __init__(self):
        super().__init__()    
        self.conv=nn.Conv2d(8, 8, 3,1,1)
        self.relu=nn.ReLU()
    def forward(self,x):
        x=self.conv(x)
        x=self.relu(x)
        return x

class dl(nn.Module):
    def __init__(self):
        super().__init__()
        comb=[]
        for _ in range(4):
            comb.append(conv_relu)
        self.comb=nn.Sequential(*comb)

    def forward(self,x):
        x=self.comb(x)
        return x
        
if __name__=='__main__':
    x=torch.rand((1,8,10,10))
    m=dl()
    out=m(x)

如上例，会报错TypeError: __main__.conv_relu is not a Module subclass。
经过分析，最后锁定：

for _ in range(4):
    comb.append(conv_relu)

应修改成（虽然其参数为空）：

for _ in range(4):
    comb.append(conv_relu())

37. pandas 读取数据进行运算时报错：`TypeError: unsupported operand type(s) for +: 'int' and 'str'`

如下，存在D:\1.txt，数据如下：

ID	A	B	     C
1	2	nan	     nan
2	2	nan	     nan
3	2	nan	     nan
4	4	nan	     nan
5	0	nan	     nan

用pandas读取：

import pandas as pd 
File=r'D:\1.txt'
df = pd.read_csv(File, sep='\t', header=0,)
col1,col2=[1],[3]
df.values[:,col1]+df.values[:,col2]

让其第1列和第3列相加，会报题述错误，为什么呢？

打印df.values，发现：

array([[1, 2, nan, '     nan'],
       [2, 2, nan, '     nan'],
       [3, 2, nan, '     nan'],
       [4, 4, nan, '     nan'],
       [5, 0, nan, '     nan']], dtype=object)

第2列的nan读的是np.nan，而最后一列却读成了字符串，整体数据类型是‘object’，因此出错。

这应该是pandas内部的默认机制导致的。通过强制数据类型，就可以解决此问题。

df= df.values.astype(np.float32）

完整代码如下：

import pandas as pd 
import numpy as np
File=r'D:\1.txt'
df = pd.read_csv(File, sep='\t', header=0,)
df = df.values.astype(np.float32)
col1,col2=[1],[3]
df[:,col1]+df[:,col2]

38.`AttributeError: 'builtin_function_or_method' object has no attribute 'fftn'`

解决办法（添加import）：

import torch.fft

参考官网描述：
在这里插入图片描述

38.`ValueError: a must be 1-dimensional`

import numpy as np
arr = np.array([[0,0,0],
       [0,1,1],
       [1,0,1],
       [1,1,0]])
arr = np.random.choice(arr)

就会报错：

mtrand.pyx in numpy.random.mtrand.RandomState.choice()
ValueError: a must be 1-dimensional

参考https://www.icode9.com/content-4-335294.html，修改如下：

import numpy as np
arr = np.array([[0,0,0],
       [0,1,1],
       [1,0,1],
       [1,1,0]])
ind = np.random.choice(np.arange(len(arr)))

arr = arr[ind]

或者直接用random库：

import random 
arr = np.array([[0,0,0],
       [0,1,1],
       [1,0,1],
       [1,1,0]])
arr=random.choice(arr)

39.`ImportError: cannot import name 'evaluate' from 'test' (c:\python37\lib\test\init.py)`

从test.py中import evaluate自定义函数出错。

这是因为不该用test命名，和Python自定义的包冲突了。

可以验证下，在编辑器中输入import test=，通过转到定义功能就会发现：
在C:\Python37\Lib\test有很多文件。

这就是出错的原因了。

将自己的test文件夹重命名下，应该就OK了。

40.`ValueError: dictionary update sequence element #0 has length 1; 2 is required`

使用dict函数将键值对序列创建为字典时，如下代码将触发错误。

dict((('z',12)))

可修改为：

dict([('z',12)])
# {'z': 12}

41.编译过程中的错误

以下错误是本人学习CornerNet 官方源码过程中，编译出现的错误。

41.1 warnings.warn(‘Error checking compiler version for {}: {}’.format(compiler, error))

C:\Python37\lib\site-packages\torch\utils\cpp_extension.py:287: UserWarning: Error checking compiler version for cl: 'utf-8' codec can't decode byte 0xd3 in position 0: invalid continuation byte
  warnings.warn('Error checking compiler version for {}: {}'.format(compiler, error))

参考https://blog.csdn.net/tanmx219/article/details/100829920，把C:\python37\lib\site-packages\torch\utils\cpp_extension.py 283行修改为

match = re.search(r'(\d+)\.(\d+)\.(\d+)', compiler_info.decode(' gbk').strip())

42.使用nn.DataParallel训练后加载权重出错

RuntimeError: Error(s) in loading state_dict for ****:
	Missing key(s) in state_dict:

使用torch.load加载权重会发现权重关键字多了"module."，这样就造成了model.load_state_dict出现key match错误。

解决方法参考了https://www.jianshu.com/p/e96a013ab5fd。

42.1 修改保存权重代码，重新生成权重文件

torch.save(model.module.state_dict(), weight_file)

42.2 使用代码去除权重文件中的"module."关键字，加载/保存权重文件

state_dict = torch.load(weight_file)
# create new OrderedDict that does not contain `module.`
from collections import OrderedDict
new_state_dict = OrderedDict()
for k, v in state_dict.items():
    name = k[7:] # remove `module.`
    new_state_dict[name] = v
    
model.load_state_dict(new_state_dict)    
torch.save(model.state_dict(),weight_file)

43.python/numpy读取pkl文件报错

np.loadtxt(path,dtype=int)

之前用numpy一波流读取pkl文件是非常nice的，这次再运行程序读取该文件却报了错。

UnicodeDecodeError: ‘gbk’ codec can’t decode byte 0x80 in position 0: illegal multibyte sequence

参考https://zhangvalue.blog.csdn.net/article/details/103764734，我也妥妥解决了此bug：

import pickle
 
f = open(path, 'rb')
data = pickle.loads(f.read())

44.`ValueError: Usecols do not match columns, columns expected but not found:`

用pandas读取表格文件出错：

pd.read_csv(file,usecols=range(2,10))

很有可能是没有设置sep参数（默认sep=","），导致pandas读取时没有那么多列所以报错：

0        1\t3\tNA\tNA\tNA\tNA\tNA\tNA\tNA\tNA\tNA\tNA\t...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
1        2\t3\tNA\tNA\tNA\tNA\tNA\tNA\tNA\tNA\tNA\tNA\t...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
2        3\t3\tNA\tNA\tNA\tNA\tNA\tNA\tNA\tNA\tNA\tNA\t...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
3        4\t3\tNA\tNA\tNA\tNA\tNA\tNA\tNA\tNA\tNA\tNA\t...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
4        5\t3\tNA\tNA\tNA\tNA\tNA\tNA\tNA\tNA\tNA\tNA\t...

正确的使用方法：

pd.read_csv(file,sep='\t',usecols=range(2,10))

50.环境配置

50.1 Error checking compiler version for cl

c:\python37\lib\site-packages\torch\utils\cpp_extension.py:304: 
UserWarning: Error checking compiler version for cl: [WinError 2] 系统找不到指定的文件。
  warnings.warn(f'Error checking compiler version for {compiler}: {error}')

见参考15，将cl.exe所在目录路径添加到环境变量。
在这里插入图片描述

51. PyTorch Not Implemented Error

参考PyTorch NotImplementedError in forward，有可能是把 forward写在了 __init__ 之下。

class model(nn.Module):
    def __init__(self, ...):
        super().__init__()
        ...
        ...

        def forward(self, x):
            y=...
            return y

在这里插入图片描述

代码出错确实是很影响效率，把自己碰到的一些错误收集起来，下次遇到就可以很快锁定fix方案了。

方便自己，也方便大家。

python/pytorch 个人coding中的报错/异常

python/pytorch 个人coding报错/异常

1.调用函数返回‘None’type

2.使用matplotlib.pyplot 保存图片为空

3.TypeError: Image data of dtype object cannot be converted to float

4.RuntimeError: stack expects each tensor to be equal size, but got [2, 5] at entry 0 and [1, 5] at entry 1

5.torch.gather（） RuntimeError: Size does not match at dimension 0 get 2 vs 1

6.TypeError: list indices must be integers or slices, not tuple

7.发生异常: ModuleNotFoundError No module named '***'

7.1 其中一个解决方法

7.2 绝对路径–最简单直接且有效的解决方法

8. ValueError: could not convert string to float: 'r'

9. RuntimeError: CUDA error: an illegal memory access was encountered

10. UnpicklingError: A load persistent id instruction was encountered,but no persistent_load function was specified.

10.1 Error触发情景：在google colab中训练好的.pth文件，下载到本地cpu进行加载，出现该错误。

10.2 解决办法

10.2.1 方法一

10.2.2 方法二

11.TypeError: new() received an invalid combination of arguments - got (numpy.float64, int), but expected one of: * (*, torch.device device)

12.EmptyDataError: No columns to parse from file

13.AttributeError: predict_proba is not available when probability=False

14.(null): can't open file '***': [Errno 2] No such file or directory

15.OSError: [WinError 123] 文件名、目录名或卷标语法不正确。

16.TypeError: **func（） got an unexpected keyword argument ***

17.RuntimeError: pad should be smaller than half of kernel size, but got padW = 1, padH = 1, kW = 1, kH = 1

18.RuntimeError: **.pt is a zip archive(did you mean to use torch.jit.load()?)

19. 'pip' 不是内部或外部命令，也不是可运行的程序 或批处理文件

20. ipykernel_launcher.py: error: unrecognized arguments: -f /root/.local/share/jupyter/runtime/kernel-347b9f63-bf6e-4d91-8b6f-9a9b9b10e20a.json An exception has occurred, use %tb to see the full traceback.

21. raise TypeError("Invalid dimensions for image data") TypeError: Invalid dimensions for image data

22. RuntimeError: Expected object of scalar type Float but got scalar type Double for argument #4 'mat1'

23.RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [16, 64, 160, 160]], which is output 0 of ReluBackward1, is at version 2; expected version 1 instead. Hint: enable anomal①

24.TypeError: forward() takes 1 positional argument but 2 were given

24.1 在forward中使用for循环

24.2 使用nn.Sequential(*list)替换nn.ModuleList(list）

25.ValueError: num_samples should be a positive integer value, but got num_samples=0

26.BrokenPipeError: [Errno 32] Broken pipe

28.构造子函数/类过程中的两个SyntaxError

28.1 SyntaxError: positional argument follows keyword argument

28.2 SyntaxError: non-default argument follows default argument

29.FileNotFoundError: [Errno 2] No such file or directory:

30.ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

31. FileNotFoundError: [WinError 3] 系统找不到指定的路径。

32.字符串中含有双引号/单引号出差 SyntaxError: invalid syntax

33.OSError: [Errno 22] Invalid argument: 'E:\x01.pkl'

33.1 单斜杠都写成双斜杠

33.2 字符串前加禁止转义符r

34.RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

35.TypeError: 'Tensor' object is not callable

36.AttributeError: 'builtin_function_or_method' object has no attribute 'fftn'

37.TypeError: __main__.conv_relu is not a Module subclass'

37. pandas 读取数据进行运算时报错：TypeError: unsupported operand type(s) for +: 'int' and 'str'

38.AttributeError: 'builtin_function_or_method' object has no attribute 'fftn'

38.ValueError: a must be 1-dimensional

39.ImportError: cannot import name 'evaluate' from 'test' (c:\python37\lib\test\__init__.py)

40.ValueError: dictionary update sequence element #0 has length 1; 2 is required

41.编译过程中的错误

41.1 warnings.warn(‘Error checking compiler version for {}: {}’.format(compiler, error))

42.使用nn.DataParallel训练后加载权重出错

42.1 修改保存权重代码，重新生成权重文件

42.2 使用代码去除权重文件中的"module."关键字，加载/保存权重文件

43.python/numpy读取pkl文件报错

44.ValueError: Usecols do not match columns, columns expected but not found:

50.环境配置

50.1 Error checking compiler version for cl

51. PyTorch Not Implemented Error

参考文件

猜你喜欢

1.调用函数返回`‘None’type`

3.`TypeError: Image data of dtype object cannot be converted to float`

4.`RuntimeError: stack expects each tensor to be equal size, but got [2, 5] at entry 0 and [1, 5] at entry 1`

5.`torch.gather（） RuntimeError: Size does not match at dimension 0 get 2 vs 1`

6.`TypeError: list indices must be integers or slices, not tuple`

7.发生异常: `ModuleNotFoundError No module named '***'`

8. `ValueError: could not convert string to float: 'r'`

9. `RuntimeError: CUDA error: an illegal memory access was encountered`

10. `UnpicklingError: A load persistent id instruction was encountered,but no persistent_load function was specified.`

11.`TypeError: new() received an invalid combination of arguments - got (numpy.float64, int), but expected one of: * (*, torch.device device)`

12.`EmptyDataError: No columns to parse from file`

13.`AttributeError: predict_proba is not available when probability=False`

14.`(null): can't open file '***': [Errno 2] No such file or directory`

15.`OSError: [WinError 123]` 文件名、目录名或卷标语法不正确。

16.`TypeError: func（） got an unexpected keyword argument *`

17.`RuntimeError: pad should be smaller than half of kernel size, but got padW = 1, padH = 1, kW = 1, kH = 1`

18.`RuntimeError: **.pt is a zip archive(did you mean to use torch.jit.load()?)`

19. `'pip' 不是内部或外部命令，也不是可运行的程序或批处理文件`

20. `ipykernel_launcher.py: error: unrecognized arguments: -f /root/.local/share/jupyter/runtime/kernel-347b9f63-bf6e-4d91-8b6f-9a9b9b10e20a.json An exception has occurred, use %tb to see the full traceback.`

21. `raise TypeError("Invalid dimensions for image data") TypeError: Invalid dimensions for image data`

22. `RuntimeError: Expected object of scalar type Float but got scalar type Double for argument #4 'mat1'`

23.`RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [16, 64, 160, 160]], which is output 0 of ReluBackward1, is at version 2; expected version 1 instead. Hint: enable anomal`①

24.`TypeError: forward() takes 1 positional argument but 2 were given`

25.`ValueError: num_samples should be a positive integer value, but got num_samples=0`

26.`BrokenPipeError: [Errno 32] Broken pipe`

28.1 `SyntaxError: positional argument follows keyword argument`

28.2 `SyntaxError: non-default argument follows default argument`

29.`FileNotFoundError: [Errno 2] No such file or directory:`

30.`ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()`

31. `FileNotFoundError: [WinError 3]` 系统找不到指定的路径。

32.字符串中含有双引号/单引号出差 `SyntaxError: invalid syntax`

33.`OSError: [Errno 22] Invalid argument: 'E:\x01.pkl'`

34.`RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.`

35.`TypeError: 'Tensor' object is not callable`

36.`AttributeError: 'builtin_function_or_method' object has no attribute 'fftn'`

37.`TypeError: main.conv_relu is not a Module subclass'`

37. pandas 读取数据进行运算时报错：`TypeError: unsupported operand type(s) for +: 'int' and 'str'`

38.`AttributeError: 'builtin_function_or_method' object has no attribute 'fftn'`

38.`ValueError: a must be 1-dimensional`

39.`ImportError: cannot import name 'evaluate' from 'test' (c:\python37\lib\test\init.py)`

40.`ValueError: dictionary update sequence element #0 has length 1; 2 is required`

44.`ValueError: Usecols do not match columns, columns expected but not found:`