我是做分类的数据集二分类
首先建立一个文件夹名为Ttrain
其次在Ttrain下建立train和val子文件夹
再次在train文件夹下建立两个文件夹,这里分别为train_good和train_bad
val文件夹也类似。同样也是在val下建立两个文件夹名为val_good和val_bad
之后就是把你图片按照类别放到这几个文件夹下。
第二部就是制作label
制作label 我用的是python 脚本实现的。
pycharm下建立一个project 同时,把做好的Train 放在这个项目下。建立一个LabelMaker.py的文件。分别打上标签0和1
代码如下:
# coding:utf-8 ''''' Created on Jul 29, 2016 @author: zhuiyunzhugang ''' import os def IsSubString(SubStrList, Str): flag = True for substr in SubStrList: if not (substr in Str): flag = False return flag def GetFileList(FindPath, FlagStr=[]): FileList = [] FileNames = os.listdir(FindPath) if len(FileNames) > 0: for fn in FileNames: if len(FlagStr) > 0: if IsSubString(FlagStr, fn): fullfilename = os.path.join(FindPath, fn) FileList.append(fullfilename) else: fullfilename = os.path.join(FindPath, fn) FileList.append(fullfilename) if len(FileList) > 0: FileList.sort() return FileList train_txt = open('Ftrain.txt', 'w') imgfile = GetFileList('../Ttrain/train/train_good') for img in imgfile: str1 = img + ' ' + '1' + '\n' train_txt.writelines(str1) imgfile = GetFileList('../Ttrain/train/train_bad') for img in imgfile: str2 = img + ' ' + '0' + '\n' train_txt.writelines(str2) train_txt.close() # 测试集文件列表 test_txt = open('Fval.txt', 'w') imgfile = GetFileList('../Ttrain/val/val_good') for img in imgfile: str3 = img + ' ' + '1' + '\n' test_txt.writelines(str3) imgfile = GetFileList('../Ttrain/val/val_bad') for img in imgfile: str4 = img + ' ' + '0' + '\n' test_txt.writelines(str4) test_txt.close() print("成功生成文件列表")
生成标签之后把打开标签文件。右键查找和替换,只保留train_good/文件名等。其他几个类似。
第三就是生成数据集啦。
把图片和label路径记录好。代码如下:
#!/usr/bin/env sh
# Create the imagenet lmdb inputs
# N.B. set the path to the imagenet train + val data dirs
set -e
EXAMPLE=caffe/My_Files/Build_lmdb2
DATA=caffe/My_Files/label2
TOOLS=caffe/build/tools
TRAIN_DATA_ROOT=caffe/My_Files/Ttrain/train/
VAL_DATA_ROOT=caffe/My_Files/Ttrain/val/
# Set RESIZE=true to resize the images to 256x256. Leave as false if images have
# already been resized using another tool.
RESIZE=false
if $RESIZE; then
RESIZE_HEIGHT=66
RESIZE_WIDTH=66
else
RESIZE_HEIGHT=0
RESIZE_WIDTH=0
fi
if [ ! -d "$TRAIN_DATA_ROOT" ]; then
echo "Error: TRAIN_DATA_ROOT is not a path to a directory: $TRAIN_DATA_ROOT"
echo "Set the TRAIN_DATA_ROOT variable in create_imagenet.sh to the path" \
"where the ImageNet training data is stored."
exit 1
fi
if [ ! -d "$VAL_DATA_ROOT" ]; then
echo "Error: VAL_DATA_ROOT is not a path to a directory: $VAL_DATA_ROOT"
echo "Set the VAL_DATA_ROOT variable in create_imagenet.sh to the path" \
"where the ImageNet validation data is stored."
exit 1
fi
echo "Creating train lmdb..."
GLOG_logtostderr=1 $TOOLS/convert_imageset \
--resize_height=$RESIZE_HEIGHT \
--resize_width=$RESIZE_WIDTH \
--shuffle \
$TRAIN_DATA_ROOT \
$DATA/Ftrain.txt \
$EXAMPLE/train_lmdb
echo "Creating val lmdb..."
GLOG_logtostderr=1 $TOOLS/convert_imageset \
--resize_height=$RESIZE_HEIGHT \
--resize_width=$RESIZE_WIDTH \
--shuffle \
$VAL_DATA_ROOT \
$DATA/Fval.txt \
$EXAMPLE/val_lmdb
echo "Done."
EXAMPLE是数据集的位置。DATA是标签的位置
编译就可以啦
sudo sh 文件名.sh
第四 均值文件的生成代码如下:
#!/usr/bin/env sh
# Compute the mean image from the imagenet training lmdb
# N.B. this is available in data/ilsvrc12
EXAMPLE=caffe/My_Files/Build_lmdb2#trian_lmdb 所在的位置
DATA=caffe/My_Files/Build_lmdb2
TOOLS=caffe/build/tools
$TOOLS/compute_image_mean $EXAMPLE/train_lmdb \
$DATA/mean.binaryproto
echo "Done."
执行一下就可以了
sudo sh 文件名.sh
大功告成!