caffe数据集的制作。本人小白都是自己一点点走过的坑。希望能帮到大家。

我是做分类的数据集二分类

首先建立一个文件夹名为Ttrain

其次在Ttrain下建立train和val子文件夹

再次在train文件夹下建立两个文件夹，这里分别为train_good和train_bad

val文件夹也类似。同样也是在val下建立两个文件夹名为val_good和val_bad

之后就是把你图片按照类别放到这几个文件夹下。

第二部就是制作label

制作label 我用的是python 脚本实现的。

pycharm下建立一个project 同时，把做好的Train 放在这个项目下。建立一个LabelMaker.py的文件。分别打上标签0和1

代码如下：

# coding:utf-8
'''''
Created on Jul 29, 2016

@author: zhuiyunzhugang
'''
import os


def IsSubString(SubStrList, Str):
    flag = True
    for substr in SubStrList:
        if not (substr in Str):
            flag = False

    return flag


def GetFileList(FindPath, FlagStr=[]):
    FileList = []
    FileNames = os.listdir(FindPath)
    if len(FileNames) > 0:
        for fn in FileNames:
            if len(FlagStr) > 0:
                if IsSubString(FlagStr, fn):
                    fullfilename = os.path.join(FindPath, fn)
                    FileList.append(fullfilename)
            else:
                fullfilename = os.path.join(FindPath, fn)
                FileList.append(fullfilename)

    if len(FileList) > 0:
        FileList.sort()

    return FileList


train_txt = open('Ftrain.txt', 'w')
imgfile = GetFileList('../Ttrain/train/train_good') 
for img in imgfile:
    str1 = img + ' ' + '1' + '\n'  
    train_txt.writelines(str1)

imgfile = GetFileList('../Ttrain/train/train_bad')
for img in imgfile:
    str2 = img + ' ' + '0' + '\n'
    train_txt.writelines(str2)
train_txt.close()

# 测试集文件列表
test_txt = open('Fval.txt', 'w')
imgfile = GetFileList('../Ttrain/val/val_good') 
for img in imgfile:
    str3 = img + ' ' + '1' + '\n'
    test_txt.writelines(str3)

imgfile = GetFileList('../Ttrain/val/val_bad')
for img in imgfile:
    str4 = img + ' ' + '0' + '\n'
    test_txt.writelines(str4)
test_txt.close()
print("成功生成文件列表")

生成标签之后把打开标签文件。右键查找和替换，只保留train_good/文件名等。其他几个类似。

第三就是生成数据集啦。

把图片和label路径记录好。代码如下：

#!/usr/bin/env sh
# Create the imagenet lmdb inputs
# N.B. set the path to the imagenet train + val data dirs
set -e

EXAMPLE=caffe/My_Files/Build_lmdb2
DATA=caffe/My_Files/label2
TOOLS=caffe/build/tools

TRAIN_DATA_ROOT=caffe/My_Files/Ttrain/train/
VAL_DATA_ROOT=caffe/My_Files/Ttrain/val/

# Set RESIZE=true to resize the images to 256x256. Leave as false if images have
# already been resized using another tool.
RESIZE=false
if $RESIZE; then
RESIZE_HEIGHT=66
RESIZE_WIDTH=66
else
RESIZE_HEIGHT=0
RESIZE_WIDTH=0
fi

if [ ! -d "$TRAIN_DATA_ROOT" ]; then
echo "Error: TRAIN_DATA_ROOT is not a path to a directory: $TRAIN_DATA_ROOT"
echo "Set the TRAIN_DATA_ROOT variable in create_imagenet.sh to the path" \
"where the ImageNet training data is stored."
exit 1
fi

if [ ! -d "$VAL_DATA_ROOT" ]; then
echo "Error: VAL_DATA_ROOT is not a path to a directory: $VAL_DATA_ROOT"
echo "Set the VAL_DATA_ROOT variable in create_imagenet.sh to the path" \
"where the ImageNet validation data is stored."
exit 1
fi

echo "Creating train lmdb..."

GLOG_logtostderr=1 $TOOLS/convert_imageset \
    --resize_height=$RESIZE_HEIGHT \
    --resize_width=$RESIZE_WIDTH \
    --shuffle \
    $TRAIN_DATA_ROOT \
    $DATA/Ftrain.txt \
    $EXAMPLE/train_lmdb

echo "Creating val lmdb..."

GLOG_logtostderr=1 $TOOLS/convert_imageset \
    --resize_height=$RESIZE_HEIGHT \
    --resize_width=$RESIZE_WIDTH \
    --shuffle \
    $VAL_DATA_ROOT \
    $DATA/Fval.txt \
    $EXAMPLE/val_lmdb

echo "Done."

EXAMPLE是数据集的位置。DATA是标签的位置

编译就可以啦

sudo sh 文件名.sh

第四均值文件的生成代码如下：

#!/usr/bin/env sh
# Compute the mean image from the imagenet training lmdb
# N.B. this is available in data/ilsvrc12

EXAMPLE=caffe/My_Files/Build_lmdb2#trian_lmdb 所在的位置
DATA=caffe/My_Files/Build_lmdb2
TOOLS=caffe/build/tools

$TOOLS/compute_image_mean $EXAMPLE/train_lmdb \
$DATA/mean.binaryproto

echo "Done."

执行一下就可以了

sudo sh 文件名.sh

大功告成！

caffe数据集的制作。本人小白都是自己一点点走过的坑。希望能帮到大家。

猜你喜欢