零基础做一个基于神经网络的、手写识别计算器

先说一下开始之前我具备的知识，用过常用的linux命令，写过几个简单的shell脚本，编程上会c和一点点java，其他的技能和这次做的东西一点关系都没有就不提了（前面的其实除啦linux之外也没啥关系）。这个文章主要用于记录这次实现这个东西的过程，总结一些经验，所以可能会有点啰嗦。最后就是用神经网络实现手写数字和加减乘除等符号的识别，做到这步要算出式子答案就轻而易举了。

初步构想，要做一个手写的计算器，主要难度在于手写识别，界面什么的估计问题不大，虽然之前没学过，但是也听说过这种问题最好的解决方法肯定是神经网络。具体怎么做完全不知道，这种时候不能指望坐着想出大致思路，因为根本没有基础，于是我就随手百度，比如手写数字识别，神经网络入门等等关键字，相信几次百度并不断修改词语，你会获得一些有用的信息的。

我搜索了半天后，发现这方面基于c语言的资料很少，基本都是python，于是决定至少得学个python了，这个时候我决定先暂停搜索，既然已经确定要用python写，就先把它搞完再考虑下一步的事，不要指望一次性得到整个项目的思路。

学习python参考另几篇文章https://blog.csdn.net/sinat_30457013/article/details/89523390，我耐心比较差，就学了两天，连语法都没学完，但是已经有点感觉，就先往下搞了，把后面粗略看了下，等需要的时候再回来细看吧。

此时我再来搜索，碰巧发现了《神经网络与深度学习》一书，网上有人翻译了几章http://www.cnblogs.com/pathrough/p/5855084.html ，我大致看了下，发现它只用74行就实现了手写数字的识别，如果我能把它扩充一下，就有可能实现加减乘除和数字的识别了，问题就是没有加减乘除的数据集，碰巧的是，https://blog.csdn.net/qq_34919953/article/details/81048259 这里刚好有人做过类似的事情，且提供了缺少的数据集，这至少让我知道了这个方案是可行的，那接下来首先就是得把书里的例程运行起来。

《神经网络与深度学习》教程里提到需要安装numpy，遂打算安装，查了下用pip安装看起来比较方便，所以先安装pip，命令如下

$ curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py # 下载安装脚本

$ sudo python get-pip.py # 运行安装脚本

没出什么问题，但是接着安装numpy，按照网上说的输入

python -m pip install --user numpy scipy matplotlib ipython jupyter pandas sympy nose

安装出了问题，错误较多，我都懒得看了，想先看看有没有其他办法，发现安装Anaconda即可解决，Anaconda是python的一个发行版，自带了这些科学运算的包，自然就省的自己安装，于是决定安装它，下载出错了几次，途中看了下pip，想着之前执行了两次安装命令，应该下了重复的包，所以想先把它删掉，看之前安装命令的输出，发现是在用户目录的.cache下，注意，这里得用ls -al才能看见这个文件夹，既然是缓存，我估计删了应该没事，我就把它都删了。然后顺便用pip list看下已经安装了哪些包，意外的发现，numpy已经安装上了，就是matplotlib后面的包都安装失败了。输入以下命令测试了下

>>> from numpy import *

>>> eye(4)

array([[1., 0., 0., 0.],

[0., 1., 0., 0.],

[0., 0., 1., 0.],

[0., 0., 0., 1.]])

from numpy import * 为导入 numpy 库。

eye(4) 生成对角矩阵。

发现没问题，那就先试着用用看

继续照着教程来，先得把工程用到的文件拷到linux虚拟机上，问题又出现了，我把zip文件用unzip解压失败，网上查了下，2g以上不能用unzip，然而我这才18m，又试啦下jar和7za来解压，均不行，没办法，还好文件不多，只能用最蠢的办法，先在Windows下解压，再全部拖进去。然后按照教程输入命令，加载数据，然后出现crc检测失败，查了下，linux下解压还会带些参数，估计就是因为在windows下解压再拖进去的关系，又想了个办法，先把zip转为tar.gz，再拖到虚拟机里试试，找了一个在线转换的网站试啦一下，结果倒是能正常解压了，解压出来的是zip文件…,有点无计可施了，想了下zip传进去不行，那用git命令直接下到linux里试试，git下过来的直接是整个项目文件，没有被压缩，试啦下果然可以。

后面几个命令都顺利执行，识别准确率显示百分之九十五左右。

接下来要看懂这个代码并修改就有些麻烦了，毕竟目前python语法都没看完，不知道怎么下手。

一番搜索后，打算这么搞，先根据这篇文章：https://www.cnblogs.com/xianhan/p/9145966.html 把mnist数据集的格式以及数据读取搞清楚，再了解教程里的数据格式，这样就能了解清楚二者数据的具体格式，后面才能想办法把数据扩充，并且能引用某个数据利用神经网络进行识别看结果。那么首先就得安装教程及博客里用到的许多库，想了想还是直接安装anaconda吧。

安装Anoconda，我把安装包拖进虚拟机，发现大小对不上，用wc -c 命令统计字节数远小于600多M，我估计之前unzip一直失败，也是由于这个拖进去传输文件不靠谱，查了下，

用文件共享方法实现了（具体看另一篇文章https://blog.csdn.net/sinat_30457013/article/details/89523631）。

具体怎么安装Anaconda也看另一篇https://blog.csdn.net/sinat_30457013/article/details/89523480。

了解minist数据集的格式参考另一篇：https://blog.csdn.net/sinat_30457013/article/details/89523551

数据格式了解清楚后，就可能把下载到的运算符数据与数字数据整合在一起，再相应修改神经网络的代码。显然，困难在于修改模型，我把教程书的第一章看完，

大致看了下代码，过程请看另一篇https://blog.csdn.net/sinat_30457013/article/details/89523533。

看下来之后，我判断很有可能只需要修改mnist_loader这个文件，而不用修改network就可以实现了。因为network.py里的东西都是向量或矩阵运算，不涉及输出数据的具体长度，但是由于训练部分的代码没怎么看，所以不是很有把握，纠结了很久，拿不准，那就只能先试试了。

首先读取符号的图片数据保存为可以直接加载为变量的文件，方便后面加载使用，

读取保存部分代码参考另一篇：https://blog.csdn.net/sinat_30457013/article/details/89523604

至此，准备工作差不多了，要开始修改mnist_loader.py 把运算符数据整合进去了，扩充数据参考另一篇：https://blog.csdn.net/sinat_30457013/article/details/89523571

此时，已经可以训练出我们需要的神经网络了，命令直接照着书里的打，有一个要改的就是设定神经网络时的神经网络的输出个数那里，如我是10改为14，

接踵而至的一个问题就是如何导出模型，如果前面书里的内容以及例程代码仔细看过就会发现，神经网络模型的导出和使用需要的操作其实都已经做过了，导出模型即为保存weights和biases的变量，使用模型已经在例程里有函数了。代码如下

import pickle

import numpy as np

def savPara(weights,biases):

"""save the weights and biases in the neralPara.data"""

#the file name that save the variable

filename = 'weights.data'

f =open(filename, 'wb')#open the target file

#save the variable

pickle.dump(weights, f)

f.close()

filename = 'biases.data'

f =open(filename,'wb')

pickle.dump(biases, f)

f.close()

#del weights ;delete the variable

def loadPara(filename):

f = open(filename, 'rb')

var = pickle.load(f)

return var

def feedforward(a,biases,weights):

"""Return the output of the network if "a" is input.form of a is 784*1"""

for b, w in zip(biases, weights):

a = sigmoid(np.dot(w, a)+b)

return a

def sigmoid(z):

return 1.0/(1.0+np.exp(-z))

feedforward函数就是给出模型和输入，返回识别结果

至此，识别部分已经解决，剩下的就是计算器界面等工作了，由于刚好学了python，就打算用python来写，自然就百度到了pyqt。图像界面代码编写就不细讲了，只说大致思路和参考的博客链接：

基本手写界面参考：https://www.jb51.net/article/126189.htm

按键添加直接百度即可，主要是两个按键，一个清零画面，一个识别并计算结果，清零只需把画笔改成白色，重新画一遍即可，识别部分得获取各个符号的图片数据并转成28*28的灰度图，其他好说，各个符号的要分开获得图片比较麻烦，我偷懒了下，直接在界面上画了几个格子，然后写的时候写在框子里就行了，其他都有现成api可以调用，注意截图这里最常用的可能是ImageGrab，但是这个在linux上不能使用，可以用pyscreenshot代替，但是速度贼慢，且直接调用会出问题，我这里直接用pyqt自带的api实现截图获取图片数据。

整个代码如下（还在修改，目前就是读取前三个框数据并识别）

import sys

from PyQt5.QtWidgets import (QApplication, QWidget)

from PyQt5.QtGui import (QPainter, QPen)

from PyQt5.QtGui import QPixmap

from PyQt5 import QtGui

from PyQt5.QtCore import Qt

from PyQt5.QtCore import pyqtSlot

from PyQt5.QtCore import QCoreApplication

from PyQt5.QtGui import QIcon

from PyQt5.QtWidgets import QApplication, QWidget, QPushButton

import recognize

class Example(QWidget):

def __init__(self):

super(Example, self).__init__()

self.splitLineWidth = 10

self.square = 200

#set window's width and height , a square 200*200

self.resize(5*self.square + 4*self.splitLineWidth, 5*self.square+4*self.splitLineWidth)

self.setWindowTitle("calculator")

self.setMouseTracking(False)

#the value to save the mouse position

self.pos_xy = []

palette1 = QtGui.QPalette()

palette1.setColor(self.backgroundRole(),Qt.white)

self.setPalette(palette1)

QCoreApplication.setAttribute(Qt.AA_ShareOpenGLContexts)

self.clearFlag = 0

self.initUI()

#save the paint window's winId

self.winid=self.winId()

def initUI(self):

clrButton = QPushButton("clear", self)

clrButton.move(490, 920)

clrButton.clicked.connect(self.clear_on_click)

calButton = QPushButton("calculate", self)

calButton.move(490, 950)

calButton.clicked.connect(self.calculate_on_click)

self.show()

#this function is invoked by the system when upadate()(in the mouseMoveEvent) is executed

def paintEvent(self, event):

painter = QPainter()

painter.begin(self)

pen = QPen(Qt.black, 20, Qt.SolidLine, Qt.RoundCap, Qt.RoundJoin)

painter.setPen(pen)

if self.clearFlag == 0:

if len(self.pos_xy) > 1:

point_start = self.pos_xy[0]

for pos_tmp in self.pos_xy:

point_end = pos_tmp

if point_end == (-1, -1):

point_start = (-1, -1)

continue

if point_start == (-1, -1):

point_start = point_end

continue

painter.drawLine(point_start[0], point_start[1], point_end[0], point_end[1])

point_start = point_end

elif self.clearFlag == 1:

pen = QPen(Qt.white, 30, Qt.SolidLine, Qt.RoundCap, Qt.RoundJoin)

painter.setPen(pen)

if len(self.pos_xy) > 1:

point_start = self.pos_xy[0]

for pos_tmp in self.pos_xy:

if pos_tmp[0] > 0:

point_end = pos_tmp

painter.drawLine(point_start[0], point_start[1], point_end[0], point_end[1])

point_start = point_end

self.pos_xy = []

self.clearFlag = 0

pen = QPen(Qt.black, 10, Qt.SolidLine, Qt.RoundCap, Qt.RoundJoin)

painter.setPen(pen)

#draw the horizontal line

painter.drawLine(0,1*self.square+self.splitLineWidth/2,5*self.square+4*self.splitLineWidth,1*self.square+self.splitLineWidth/2)

painter.drawLine(0,2*self.square+self.splitLineWidth/2+self.splitLineWidth*1,5*self.square+4*self.splitLineWidth,2*self.square+self.splitLineWidth/2+self.splitLineWidth*1)

painter.drawLine(0,3*self.square+self.splitLineWidth/2+self.splitLineWidth*2,5*self.square+4*self.splitLineWidth,3*self.square+self.splitLineWidth/2+self.splitLineWidth*2)

painter.drawLine(0,4*self.square+self.splitLineWidth/2+self.splitLineWidth*3,5*self.square+4*self.splitLineWidth,4*self.square+self.splitLineWidth/2+self.splitLineWidth*3)

#darw the vertical line

painter.drawLine(1*self.square+self.splitLineWidth/2,0,1*self.square+self.splitLineWidth/2,5*self.square+4*self.splitLineWidth)

painter.drawLine(2*self.square+self.splitLineWidth/2+self.splitLineWidth*1,0,2*self.square+self.splitLineWidth/2+self.splitLineWidth*1,5*self.square+4*self.splitLineWidth)

painter.drawLine(3*self.square+self.splitLineWidth/2+self.splitLineWidth*2,0,3*self.square+self.splitLineWidth/2+self.splitLineWidth*2,5*self.square+4*self.splitLineWidth)

painter.drawLine(4*self.square+self.splitLineWidth/2+self.splitLineWidth*3,0,4*self.square+self.splitLineWidth/2+self.splitLineWidth*3,5*self.square+4*self.splitLineWidth)

painter.end()

def mouseMoveEvent(self, event):

pos_tmp = (event.pos().x(), event.pos().y())

self.pos_xy.append(pos_tmp)

self.update()

def mouseReleaseEvent(self, event):

pos_test = (-1, -1)

self.pos_xy.append(pos_test)

self.update()

@pyqtSlot()

def clear_on_click(self):

#clear the painter

self.clearFlag = 1

self.update()

def calculate_on_click(self):

screen=QApplication.primaryScreen()

"""pix=screen.grabWindow(QApplication.desktop().winId(),0,0,100,100),

this can be used to capture the full screen or any area in the screen"""

#this can get the paint window,that is why I save the winId at __init__()

pix = screen.grabWindow(self.winid,0,0,self.square,self.square)

pix.save("0.jpg")

pix = screen.grabWindow(self.winid,self.square+self.splitLineWidth,0,self.square,self.square)

pix.save("1.jpg")

pix = screen.grabWindow(self.winid,self.square*2+self.splitLineWidth*2,0,self.square,self.square)

pix.save("2.jpg")

recognize.resizeAndGrey('0.jpg')

recognize.resizeAndGrey('1.jpg')

recognize.resizeAndGrey('2.jpg')

print(recognize.recog255(0))

print(recognize.recog255(1))

print(recognize.recog255(2))

if __name__ == "__main__":

app = QApplication(sys.argv)

pyqt_learn = Example()

pyqt_learn.show()

app.exec_()

里面识别调用了我自己写的几个函数，recognize内容如下：

import numpy

import read_data

import savAndLoadModel

import pyscreenshot as ImageGrab

from PIL import Image

def screenshot(pos,name):

#without the paramater,the default setting is full screen

im=ImageGrab.grab(bbox=pos)

im.save('/home/otagan/bin/recognize/%s.jpg'%name,'JPEG')

def recog255(x):

#white background

"""be used for the image that is 255 as background"""

w = savAndLoadModel.loadPara('weights.data')

b = savAndLoadModel.loadPara('biases.data')

im = read_data.get_imlist('/home/otagan/bin/recognize/')

img = read_data.get_img255(im)

#print img[7]

a = numpy.reshape(img[x],(784,1))

n = numpy.argmax(savAndLoadModel.feedforward(a,b,w))

return n

def recog0(x):

#black background

"""be used for the image that is 255 as background"""

w = savAndLoadModel.loadPara('weights.data')

b = savAndLoadModel.loadPara('biases.data')

im = read_data.get_imlist('/home/otagan/bin/recognize/')

img = read_data.get_img0(im)

#print img[0]

a = numpy.reshape(img[x],(784,1))

n = numpy.argmax(savAndLoadModel.feedforward(a,b,w))

return n

def resizeAndGrey(name):

im = Image.open(name).convert('L').resize((28,28))

im_array = numpy.array(im)

i = 0

j = 0

for imx in im_array:

j = 0

for imy in imx:

if imy > 100:

im_array[i][j] = 255

j = j + 1

i = i + 1

im = Image.fromarray(im_array,'L')

im.save(name)

由于我并没有贴出所有代码，所以不能指望照着这里复制粘贴就可以运行，不过如果仔细看过的，做出一样的效果应该是不难的。

贴上最后的界面效果：

如图，我用10-13表示加减乘除，计算结果还没写（做到这里剩下的都是小事了）。不过说实话，虽然测试数据有百分之九十五的准确率，实际我自己手写的图片传进去识别效果很差，我做了很多尝试，比如改变笔的粗细（笔的粗细是有一定影响的），用更高质量的缩放函数，然而结果还是不好，顶多只有百分之六十的正确率，这个一直没解决所有也就没往下搞。另外，我发现我的图片白色背景有略微的噪点，于是在img.save（from PIL import Image的方法）存储之前，将小于某个值的点全部归零，然而不知道为什么，存储的图片显示就是有一点噪点，我把图片读出来发现值还是有抖动，比如我把大于10的值置为120，在存储为图片之前，打印出来一切正常，确实变为120了，但是存为图片后，就是有一些噪点，且从图片里读出来的数据也是这样（大部分为120，个别在120附近抖动），这个问题一直没搞懂。。。

零基础做一个基于神经网络的、手写识别计算器

猜你喜欢