Python中的遇到的错误(持续更新)

1、TypeError: 'dict_keys' object does not support indexing

    机器学习实战第三章决策树中遇到的,主要是Python的版本问题,下面这段是Python2的写法:
firstStr = myTree.keys()[0]

    Python3:先转换成list

firstStr = list(myTree.keys())[0]

2、TypeError: write() argument must be str, not bytes

    使用pickle存储的时候出现错误

    错误代码:

try:
    with open(fileName, 'w') as fw:
        pickle.dump(inputTree, fw)
except IOError as e:
    print("File Error : " + str(e))

    错误原因:pickle的存储方式默认是二进制

    修正:

try:
    with open(fileName, 'wb') as fw:
        pickle.dump(inputTree, fw)
except IOError as e:
    print("File Error : " + str(e))

3、UnicodeDecodeError: 'gbk' codec can't decode byte 0xae in position 199: illegal multibyte sequence

  • 文件中包含了非法字符,gbk无法解析
def spamTest():
    docList = []
    classList = []
    fullList = []
    for i in range(1, 26):
        wordList = textParse(open('email/spam/%d.txt' % i).read())
        docList.append(wordList)
        fullList.extend(wordList)
        classList.append(1)
        wordList = textParse(open('email/ham/%d.txt' % i).read()) # 出错部分
        docList.append(wordList)
        fullList.extend(wordList)
        classList.append(0)
    vocabList = bayes.createVocabList(docList)
    trainingSet = list(range(50))
    testSet = []
    for i in range(10):
        randIndex = int(random.uniform(0, len(trainingSet)))
        testSet.append(trainingSet[randIndex])
        del trainingSet[randIndex]
    trainMat = []
    trainClasses = []
    for docIndex in trainingSet:
        trainMat.append(bayes.setOfWords2Vec(vocabList, docList[docIndex]))
        trainClasses.append(classList[docIndex])
    p0V, p1V, pSpam = bayes.trainNB0(array(trainMat), array(trainClasses))
    errorCount = 0
    for docIndex in testSet:
        wordVector = bayes.setOfWords2Vec(vocabList, docList[docIndex])
        if bayes.classifyNB(array(wordVector), p0V, p1V, pSpam) != classList[docIndex]:
            errorCount += 1
    print('the error rate is:', float(errorCount) / len(testSet))

1、尝试使用比gbk包含字符更多的gb18030,卒

wordList = textParse(open('email/ham/%d.txt' % i, encoding='gb18030').read())

2、忽略错误再见,通过

wordList = textParse(open('email/ham/%d.txt' % i, encoding='gb18030', errors='ignore').read())
3、打开文件看看哪个是非法字符,我选择放弃

4、TypeError: 'range' object doesn't support item deletion

# spamTest():
def spamTest():
    docList = []
    classList = []
    fullList = []
    for i in range(1, 26):
        wordList = textParse(open('email/spam/%d.txt' % i, encoding='gb18030', errors='ignore').read())
        docList.append(wordList)
        fullList.extend(wordList)
        classList.append(1)
        wordList = textParse(open('email/ham/%d.txt' % i, encoding='gb18030', errors='ignore').read())
        docList.append(wordList)
        fullList.extend(wordList)
        classList.append(0)
    vocabList = bayes.createVocabList(docList)
    trainingSet = range(50) # 需要修改部分
    testSet = []
    for i in range(10):
        randIndex = int(random.uniform(0, len(trainingSet)))
        testSet.append(trainingSet[randIndex])
        del trainingSet[randIndex] # 出错代码部分
    trainMat = []
    trainClasses = []
    for docIndex in trainingSet:
        trainMat.append(bayes.setOfWords2Vec(vocabList, docList[docIndex]))
        trainClasses.append(classList[docList])
    p0V, p1V, pSpam = bayes.trainNB0(array(trainMat), array(trainClasses))
    errorCount = 0
    for docIndex in testSet:
        wordVector = bayes.setOfWords2Vec(vocabList, docList[docIndex])
        if bayes.classifyNB(array(wordVector), p0V, p1V, pSpam) != classList[docIndex]:
            errorCount += 1
    print('the error rate is:', float(errorCount) / len(testSet))

python3.x , 出现错误 'range' object doesn't support item deletion

原因:python3.x   range返回的是range对象,不返回数组对象

解决方法:

把 trainingSet = range(50) 改为 trainingSet = list(range(50))

5、TypeError: 'numpy.float64' object cannot be interpreted as an integer

出错代码:随机梯度上升算法

# 随机梯度上升算法
def stocGradAscent0(dataMatrix, classLabels):

    m, n = shape(dataMatrix)
    alpha = 0.01
    weights = ones(n)
    for i in range(m):
        h = sigmoid(sum(dataMatrix[i] * weights))
        error = classLabels[i] - h
        weights = weights + alpha * error * dataMatrix[i]
    return weights

出错原因:error 是一个float64,

weights :<class 'numpy.ndarray'>

dataMatrix[i] :<class 'list'>

在Python中,如果是一个整型n乘以一个列表L, 列表长度会变成n*len(L),而当你用一个浮点数乘以一个列表,自然而然也就出错了,而且我们要的也不是这个结果,而是对于当前向量的每一位乘上一个error。

其实这地方就是Python 中的list和numpy的array混用的问题,对dataMatrix进行强制类型转换就行了(也可以在参数传递之前进行转换,骂人吐槽Python的类型机制)

# 随机梯度上升算法
def stocGradAscent0(dataMatrix, classLabels):
    # 强制类型转换,避免array和list混用
    dataMatrix = array(dataMatrix)
    m, n = shape(dataMatrix)
    alpha = 0.01
    weights = ones(n)
    for i in range(m):
        h = sigmoid(sum(dataMatrix[i] * weights))
        error = classLabels[i] - h
        weights = weights + alpha * error * dataMatrix[i]
    return weights


猜你喜欢

转载自blog.csdn.net/GreenHandCGL/article/details/79818514
今日推荐