前言
Cifar-10是由Hinton的两个大弟子Alex Krizhevsky和Ilya Sutskever收集的一个用于普通物体识别的数据集。DL的两大核心:数据+模型。
CIFAR-10(DataSet)这个数据集总共包含:60000张图片
1—图片尺寸:32pixel*32pixel
2—图片深度:三通道RGB的彩色图片
2—这60000张图片共分为10类,具体的分类如下图所示:
60000张图片里面有:
1–50000张训练样本
2–10000张测试样本(验证Set)
备注:
1—CIFAR-10:是一个[普通物体]识别的数据集
2—因此,这个数据集和网络模型的最大特点就是:可以很容易的将[物体识别]迁移到其他普通的物体
3—而且可以将10分类问题扩展至100类物体的分类,甚至1000类和更多类的物体分类。
注意的一点是:
示例中的数据集存在一个L
1—100003072的numpy的数组中------10000张图片每张图片的像素数组
2—单位是uint8s
3—3072存储了一个3232的彩色图片(33232==31024==3072)
4—numpy的前1024位是RGB中的R分量像素值,中间的1024位是G分量的像素值,最后的1024是B分量的像素值
5—最后注意的一点是:
CIFAR-10这个例子只能用于[小图片]的分类,正如前面讲的Mnist示例,主要用于[手写数字的识别一样]。
下载数据
首先,在data/cifar10目录下有个脚本文件:get_cifar10.sh,其源码如下
#!/usr/bin/env sh
# This scripts downloads the CIFAR10 (binary version) data and unzips it.
DIR="$( cd "$(dirname "$0")" ; pwd -P )"
cd "$DIR"
echo "Downloading..."
wget --no-check-certificate http://www.cs.toronto.edu/~kriz/cifar-10-binary.tar.gz
echo "Unzipping..."
tar -xf cifar-10-binary.tar.gz && rm -f cifar-10-binary.tar.gz
mv cifar-10-batches-bin/* . && rm -rf cifar-10-batches-bin
# Creation is split out because leveldb sometimes causes segfault
# and needs to be re-created.
echo "Done."
其中–no-check-certificate是使用“不检查证书” 选项。
下载完成如下:
数据处理
在Caffe根目录examples/cifar10/下有个create_cifar10.sh脚本,是进行数据转换的脚本:
#!/usr/bin/env sh
# This script converts the cifar data into leveldb format.
set -e
EXAMPLE=examples/cifar10
DATA=data/cifar10
DBTYPE=lmdb
echo "Creating $DBTYPE..."
rm -rf $EXAMPLE/cifar10_train_$DBTYPE $EXAMPLE/cifar10_test_$DBTYPE
./build/examples/cifar10/convert_cifar_data.bin $DATA $EXAMPLE $DBTYPE
echo "Computing image mean..."
./build/tools/compute_image_mean -backend=$DBTYPE \
$EXAMPLE/cifar10_train_$DBTYPE $EXAMPLE/mean.binaryproto
echo "Done."
上述脚本中,利用convert_cifar_data.bin可执行程序转换得到LMDB格式数据,并利用compute_image_mean程序生成均值文件。
/examples/cifar10/create_cifar10.sh
进行训练
cifar10文件下有quick类的文件,也有一堆full的文件,主要区别是quick只训练5000个,full是全部都训练,50000个。
由于我是虚拟机里进行训练,很慢,这里我进行quick训练。
如果使用Gpu的话,也可以进行full全部训练。
vim train_quick.sh
#!/usr/bin/env sh
set -e
TOOLS=./build/tools
$TOOLS/caffe train \
--solver=examples/cifar10/cifar10_quick_solver.prototxt $@
# reduce learning rate by factor of 10 after 8 epochs
$TOOLS/caffe train \
--solver=examples/cifar10/cifar10_quick_solver_lr1.prototxt \
--snapshot=examples/cifar10/cifar10_quick_iter_4000.solverstate $@
继续查看脚本中cifar10_quick_solver.prototxt文件,
# reduce the learning rate after 8 epochs (4000 iters) by a factor of 10
# The train/test net protocol buffer definition
net: "examples/cifar10/cifar10_quick_train_test.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 100
# Carry out testing every 500 training iterations.
test_interval: 500
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.001 #4000样本以下学习率是0.001
momentum: 0.9
weight_decay: 0.004
# The learning rate policy
lr_policy: "fixed"
# Display every 100 iterations
display: 100
# The maximum number of iterations
max_iter: 4000
# snapshot intermediate results
snapshot: 4000
snapshot_prefix: "examples/cifar10/cifar10_quick"
# solver mode: CPU or GPU
solver_mode: CPU
继续查看脚本中cifar10_quick_solver_lr1.prototxt文件,
# reduce the learning rate after 8 epochs (4000 iters) by a factor of 10
# The train/test net protocol buffer definition
net: "examples/cifar10/cifar10_quick_train_test.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 100
# Carry out testing every 500 training iterations.
test_interval: 500
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.0001 #4000-5000样本学习率是0.0001,降成原来的十分之一了
momentum: 0.9
weight_decay: 0.004
# The learning rate policy
lr_policy: "fixed"
# Display every 100 iterations
display: 100
# The maximum number of iterations
max_iter: 5000
# snapshot intermediate results
snapshot: 5000
snapshot_format: HDF5
snapshot_prefix: "examples/cifar10/cifar10_quick"
# solver mode: CPU or GPU
solver_mode: CPU
继续查看cifar10_quick_train_test.prototxt文件
name: "CIFAR10_quick"
layer {
name: "cifar"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mean_file: "examples/cifar10/mean.binaryproto"
}
data_param {
source: "examples/cifar10/cifar10_train_lmdb"
batch_size: 100
backend: LMDB
}
}
layer {
name: "cifar"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
mean_file: "examples/cifar10/mean.binaryproto"
}
data_param {
source: "examples/cifar10/cifar10_test_lmdb"
batch_size: 100
backend: LMDB
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 32
pad: 2
kernel_size: 5
stride: 1
weight_filler {
type: "gaussian"
std: 0.0001
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "pool1"
top: "pool1"
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 32
pad: 2
kernel_size: 5
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: AVE
kernel_size: 3
stride: 2
}
}
layer {
name: "conv3"
type: "Convolution"
bottom: "pool2"
top: "conv3"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 64
pad: 2
kernel_size: 5
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "conv3"
top: "conv3"
}
layer {
name: "pool3"
type: "Pooling"
bottom: "conv3"
top: "pool3"
pooling_param {
pool: AVE
kernel_size: 3
stride: 2
}
}
layer {
name: "ip1"
type: "InnerProduct"
bottom: "pool3"
top: "ip1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 64
weight_filler {
type: "gaussian"
std: 0.1
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "ip2"
type: "InnerProduct"
bottom: "ip1"
top: "ip2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 10
weight_filler {
type: "gaussian"
std: 0.1
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "ip2"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip2"
bottom: "label"
top: "loss"
}
上述网络文件的含义可参考之前的文章:传送门。
为了进一步分析网络结构,绘制网络图,更方便分析。
python ./python/draw_net.py ./examples/cifar10/cifar10_quick_train_test.prototxt ./examples/cifar10/cifar10_quick.png --rankdir=LR
eog /examples/cifar10/cifar10_quick.png
如下:
开始训练,用cpu训练也太慢了吧。。。。。。。
./train_quick.sh
从训练结果看,训练5000次后accuracy=0.7595, loss=0.72916。
测试或推理
测试可以使用caffe自带的classification.bin程序进行分类,也可以使用Ppython/classify.py的python程序进行测试
准备测试图片,在CAFFE_ROOT/examples/images下有cat.jpg等图片,也可以自己添加其他类别。
(1)使用classification.bin程序进行分类测试
Usage: ./build/examples/cpp_classification/classification.bin deploy.prototxt network.caffemodel mean.binaryproto labels.txt img.jpg
./build/examples/cpp_classification/classification.bin \
examples/cifar10/cifar10_quick.prototxt \
examples/cifar10/cifar10_quick_iter_5000.caffemodel.h5 \
examples/cifar10/mean.binaryproto \
data/cifar10/batches.meta.txt \
examples/images/cat.jpg
执行报错:输出标签是11,与网络的输出10不一致。
修改batches.meta.txt 文件,把batches.meta.txt的最后一行空格删除。重新执行ok.
测试结果竟然是deer概率最大,小猫的概率在0.0261,这准确率也太低了吧,汗!!!!
(2)使用python/classify.py的python程序进行测试
查看classify.py文件源码
#!/usr/bin/env python
"""
classify.py is an out-of-the-box image classifer callable from the command line.
By default it configures and runs the Caffe reference ImageNet model.
"""
import numpy as np
import os
import sys
import argparse
#argparse是python标准库里面用来处理命令行参数的库
#使用步骤:
#(1)import argparse 首先导入模块
#(2)parser = argparse.ArgumentParser() 创建一个解析对象
#(3)parser.add_argument() 向该对象中添加你要关注的命令行参数和选项
#(4)parser.parse_args() 进行解析
import glob
import time
import caffe
def main(argv):
pycaffe_dir = os.path.dirname(__file__)
parser = argparse.ArgumentParser()
# Required arguments: input and output files.
parser.add_argument(
"input_file",
help="Input image, directory, or npy."
)
parser.add_argument(
"output_file",
help="Output npy filename."
)
# Optional arguments.
parser.add_argument(
"--model_def",
default=os.path.join(pycaffe_dir,
"../models/bvlc_reference_caffenet/deploy.prototxt"),
help="Model definition file."
)
parser.add_argument(
"--pretrained_model",
default=os.path.join(pycaffe_dir,
"../models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel"),
help="Trained model weights file."
)
parser.add_argument(
"--gpu",
action='store_true',
help="Switch for gpu computation."
)
parser.add_argument(
"--center_only",
action='store_true',
help="Switch for prediction from center crop alone instead of " +
"averaging predictions across crops (default)."
)
parser.add_argument(
"--images_dim",
default='256,256',
help="Canonical 'height,width' dimensions of input images."
)
parser.add_argument(
"--mean_file",
default=os.path.join(pycaffe_dir,
'caffe/imagenet/ilsvrc_2012_mean.npy'),
help="Data set image mean of [Channels x Height x Width] dimensions " +
"(numpy array). Set to '' for no mean subtraction."
)
parser.add_argument(
"--input_scale",
type=float,
help="Multiply input features by this scale to finish preprocessing."
)
parser.add_argument(
"--raw_scale",
type=float,
default=255.0,
help="Multiply raw input by this scale before preprocessing."
)
parser.add_argument(
"--channel_swap",
default='2,1,0',
help="Order to permute input channels. The default converts " +
"RGB -> BGR since BGR is the Caffe default by way of OpenCV."
)
parser.add_argument(
"--ext",
default='jpg',
help="Image file extension to take as input when a directory " +
"is given as the input file."
)
args = parser.parse_args()
image_dims = [int(s) for s in args.images_dim.split(',')]
mean, channel_swap = None, None
if args.mean_file:
mean = np.load(args.mean_file)
if args.channel_swap:
channel_swap = [int(s) for s in args.channel_swap.split(',')]
if args.gpu:
caffe.set_mode_gpu()
print("GPU mode")
else:
caffe.set_mode_cpu()
print("CPU mode")
# Make classifier.
classifier = caffe.Classifier(args.model_def, args.pretrained_model,
image_dims=image_dims, mean=mean,
input_scale=args.input_scale, raw_scale=args.raw_scale,
channel_swap=channel_swap)
# Load numpy array (.npy), directory glob (*.jpg), or image file.
args.input_file = os.path.expanduser(args.input_file)
if args.input_file.endswith('npy'):
print("Loading file: %s" % args.input_file)
inputs = np.load(args.input_file)
elif os.path.isdir(args.input_file):
print("Loading folder: %s" % args.input_file)
inputs =[caffe.io.load_image(im_f)
for im_f in glob.glob(args.input_file + '/*.' + args.ext)]
else:
print("Loading file: %s" % args.input_file)
inputs = [caffe.io.load_image(args.input_file)]
print("Classifying %d inputs." % len(inputs))
# Classify.
start = time.time()
predictions = classifier.predict(inputs, not args.center_only)
print("Done in %.2f s." % (time.time() - start))
# Save
print("Saving results into %s" % args.output_file)
np.save(args.output_file, predictions)
if __name__ == '__main__':
main(sys.argv)
需要修改均值计算,这个均值计算是错误的。
在classify.py文件找到
mean = np.load(args.mean_file)
在下面加上一行:
mean=mean.mean(1).mean(1)
增加print打印测试的结果,在python/classify.py添加:
print("Predictions:%s" % predictions)
运行测试
python python/classify.py \
--model_def examples/cifar10/cifar10_quick.prototxt \
--pretrained_model examples/cifar10/cifar10_quick_iter_5000.caffemodel.h5 \
examples/images/cat.jpg save.file
预测的第4个类别的可能性最大,为0.243992。查看data/cifar10/batches.meta.txt文件第4个类别是cat.
上面结果来看,cifar10数据集预测结果准确率很低,这里仅是实验测试用。