2020-01-06: InsightFace project actual combat (two) data production

1. Project preparation

1. Project understanding reference: https://blog.csdn.net/hanjiangxue_wei/article/details/86566435

2. Project address: https://github.com/deepinsight/insightface 

3. Clone the above project to the local server

In command line mode on xshell:

  • Connect to the server: ssh server domain name; enter the user name and password;
  • Switch the working directory to the specified location: cd dir;
  • Clone the project to this directory: git clone https://github.com/deepinsight/insightface

2. Original data download: Take lfw data as an example, download lfw original picture data.

3. Data production process

1. Data alignment

  • Create lfwdata folder: path insightface/datasets/lfwdata/, purpose to store lfw raw data and lfw_align alignment data;
  • Create lfw_align folder: path insightface/datasets/lfwdata/lfw_align, purpose to store the cut and aligned face data;
  • In the xshell command line, activate the environment, switch to the directory where insightface/src/align/align_lfw.py is located, and execute the following statement: python3 align_lfw.py --input-dir'./insightface/datasets/lfwdata/lfw' --output -dir'./insightface/datasets/lfwdata/lfw_align'

2. Generate list: store under insightface/datasets/lfwdata/lfw

  • Create a new generatelst.py file under insightface/src/data/ and enter the following content:
import os
import random
import argparse


class PairGenerator:
    def __init__(self, data_dir, pairs_filepath, img_ext):
        """
        Parameter data_dir, is your data directory.
        Parameter pairs_filepath, where is the pairs.txt that belongs to.
        Parameter img_ext, is the image data extension for all of your image data.
        """
        self.data_dir = data_dir
        self.pairs_filepath = pairs_filepath
        self.img_ext = img_ext

    # splitting the database content into 10 random sets
    def split_to_10(self):
        folders = []
        cnt = 0
        for name in os.listdir(self.data_dir):
            folders.append(name)
        folders = sorted(folders) # sorting names in abc order

        a = []
        # names of folders - e.g. Talgat Bigeldinov, Kairat Nurtas, etc.
        for name in folders:
            # f = open(self.pairs_filepath, 'a+')
            # looping through image files in one folder
            for file in os.listdir(self.data_dir + '/' + name):
                # a.append(data_dir + name + '/' + file)

                a.append(name)
                cnt = cnt + 1
            cnt = cnt + 1
        random.shuffle(a)


    # splitting the database content into 10 random sets

    def write_similar(self, lst):
        f = open(self.pairs_filepath, 'a+')
        for i in range(20):
            left = random.choice(lst)
            right = random.choice(lst)
            f.write(left + '\t' + right + '\t' + '1\n')

    # writing 1 IMAGE_PATH LABEL like insightface lst file needs
    def write_item_label(self):
        cnt = 0
        for name in os.listdir(self.data_dir):
            if name == ".DS_Store":
                continue
            # print(name)
            a = []
            f = open(self.pairs_filepath, 'a+')
            for file in os.listdir(self.data_dir + '/' + name):
                if file == ".DS_Store":
                    continue
                a.append(data_dir + '/' + name + '/' + file)
                f.write(str(1) + '\t' + data_dir + '/' + name + '/' + file + '\t' + str(cnt) + '\n')
            cnt = cnt + 1
    # writing 1 IMAGE_PATH LABEL like insightface lst file needs in alphabetic order
    def write_item_label_abc(self):
        cnt = 0
        names = []
        for name in os.listdir(self.data_dir):
            names.append(name)

        names = sorted(names)

        for name in names:
            print(name)
            a = []
            f = open(self.pairs_filepath, 'a+')
            for file in os.listdir(self.data_dir + '/' + name):
                if file == ".DS_Store":
                    continue
                a.append(data_dir + '/' + name + '/' + file)
                f.write(str(1) + '\t' + data_dir + '/' + name + '/' + file + '\t' + str(cnt) + '\n')
            cnt = cnt + 1

    def write_different(self, lst1, lst2):
        f = open(self.pairs_filepath, 'a+')
        for i in range(500):
            left = random.choice(lst1)
            right = random.choice(lst2)
            f.write(left + '\t' + right + '\t' + '0\n')
        f.close()

    def generate_pairs(self):
        for name in os.listdir(self.data_dir):
            if name == ".DS_Store":
                continue

            a = []
            for file in os.listdir(self.data_dir + '/' + name):
                if file == ".DS_Store":
                    continue
                a.append(name + '/' + file)

            generatePairs.write_similar(a)

    def generate_non_pairs(self):
        folder_list = []
        for folder in os.listdir(self.data_dir):
            folder_list.append(folder)
        folder_list.sort(reverse=True)
        # print(folder_list)
        i = 0
        a = []
        for dir in os.listdir(self.data_dir):
            if dir == ".DS_Store":
                continue

            for file in os.listdir(self.data_dir + dir):
                if file == ".DS_Store":
                    continue
                a.append(dir + '/' + file)
            # print(a)
        b = []
        for dir in os.listdir(self.data_dir):
            if dir == ".DS_Store":
                continue
            for file in os.listdir(self.data_dir + folder_list[i]):
                if file == ".DS_Store":
                    continue
                b.append(folder_list[i] + '/' + file)
            # print(b)
            i = i + 1

        generatePairs.write_different(a, b)


if __name__ == '__main__':
    # data_dir = "/home/ti/Downloads/DATASETS/out_data_crop/"
    # pairs_filepath = "/home/ti/Downloads/insightface/src/data/pairs.txt"
    # alternative_lst = "/home/ti/Downloads/insightface/src/data/crop.lst"
    # test_txt = "/home/ti/Downloads/DATASETS/out_data_crop/test.txt"
    # img_ext = ".png"

    # arguments to pass in command line
    parser = argparse.ArgumentParser(description='Rename images in the folder according to LFW format: Name_Surname_0001.jpg, Name_Surname_0002.jpg, etc.')
    parser.add_argument('--dataset-dir', default='', help='Full path to the directory with peeople and their names, folder should denote the Name_Surname of the person')
    parser.add_argument('--list-file', default='', help='Full path to the directory with peeople and their names, folder should denote the Name_Surname of the person')
    parser.add_argument('--img-ext', default='', help='Full path to the directory with peeople and their names, folder should denote the Name_Surname of the person')
    # reading the passed arguments
    args = parser.parse_args()
    data_dir = args.dataset_dir
    lst = args.list_file
    img_ext = args.img_ext
    # generatePairs = PairGenerator(data_dir, pairs_filepath, img_ext)
    # generatePairs.write_item_label()
    # generatePairs = PairGenerator(data_dir, pairs_filepath, img_ext)
    generatePairs = PairGenerator(data_dir, lst, img_ext)
    generatePairs.write_item_label_abc() # looping through our dataset and creating 1 ITEM_PATH LABEL lst file
    # generatePairs.generate_pairs() # to use, please uncomment this line
    # generatePairs.generate_non_pairs() # to use, please uncomment this line

    # generatePairs = PairGenerator(dataset_dir, test_txt, img_ext)
    # generatePairs.split_to_10()


  • Switch to the directory where src/data/generatelst.py is located, and execute the following statement:
    • python3 generatelst.py --dataset-dir ./insightface/datasets/lfwdata/lfw_align  --list-file ./insightface/datasets/lfwdata/lfw/train.lst --img-ext '.jpg'
      • --dataset-dir followed by the aligned picture directory, absolute path (under the lfw_align folder)
      • --list-file-dir followed by the output directory of train.lst, absolute path (under the lfw folder)

3. Generate rec and idx files: store under /insightface/datasets/lfwdata/lfw

  • Create a property file under /insightface/datasets/lfwdata/lfw without suffix;
  • Open the property through the vi command, enter the number of IDs (how many individuals), picture size, picture size, that is, 5749, 112, 112;
  • Execute command under xshell: python face2rec2.py ./insightface/datasets/lfwdata/lfw/

4. Generate pair and bin files (validation set data): store under /insightface/datasets/lfwdata/lfw

(1) Generate pair file

  • Execute the following commands under xshell: python3 generate_image_pairs.py --data-dir ./insightface/datasets/lfwdata/lfw_align --outputtxt ./insightface/datasets/lfwdata/lfw/train.txt --num-samepairs 1000
    • --data-dir followed by the aligned face
    • --outputtxt is used to save the train.txt file
    • --num-samepairs how many pairs to generate
    • After running successfully, a train.txt file will be generated under datasets/train

(2) Generate bin file

  • Execute the following commands under xshell: python3 lfw2pack.py --data-dir ./insightface/datasets/lfwdata/lfw --output ./insightface/datasets/lfwdata/lfw/lfw.bin --num-samepairs 1000

Guess you like

Origin blog.csdn.net/weixin_38192254/article/details/103861231