Face detection using Detectron2 and PyTorch on a custom dataset

This article describes how to use Python to fine-tune a pre-trained object detection model on a custom face detection dataset. Learn how to prepare custom face detection datasets for Detectron2 and PyTorch, fine-tune pre-trained models to find face boundaries in images.

Face detection is the task of finding (boundary) human faces in an image. This is useful in the following situations:

  • Security system (first step in identifying people)

  • Autofocus and smile detection for great photos

  • Detect age, race, and emotional status for marketing purposes

1b5f0fcaea773df53fcb08e057aff96f.png

Historically, this has been a very thorny question. Extensive manual feature engineering, novel algorithms and methods are developed to improve the state-of-the-art.

Face detection models are included in almost every computer vision package/framework these days. Some of the best performing models use deep learning methods. For example, OpenCV provides various tools such as cascade classifiers.

In this guide, you'll learn how to:

  • Prepare a custom dataset for face detection to be used in Detectron2

  • Find faces in images using (close to) state-of-the-art object detection models

  • You can extend this work to face recognition

Detectron2

Detectron2 is a framework for building state-of-the-art object detection and image segmentation models, developed by the Facebook Research team. Detectron2 is a complete rewrite of the first version. Detectron2 uses PyTorch (compatible with the latest version) and allows for super fast training. You can learn more in Facebook Research's getting started blog post.

The real power of Detectron2 is the large number of pre-trained models available in the model zoo. But what good is it if you can't fine-tune it on your own dataset? Fortunately, it's very easy! In this guide, we'll see how to get this done.

Install Detectron2

At the time of writing, Detectron2 is still in alpha stage. Although there is an official release, we will clone and compile from the master branch. This should equal version 0.1. Let's install some requirements first:

!pip install -q cython pyyaml == 5.1 
!pip install -q -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

Then, download, compile and install the Detectron2 package: 

!git clone https://github.com/facebookresearch/detectron2 detectron2_repo 
!pip install -q -e detectron2_repo

At this point, you need to restart the notebook runtime to continue!

%reload_ext watermark %watermark -v -p numpy,pandas,pycocotools,torch,torchvision,detectron2
CPython 3.6.9
IPython 5.5.0
numpy 1.17.5
pandas 0.25.3
pycocotools 2.0
torch 1.4.0
torchvision 0.5.0
detectron2 0.1
import torch, torchvision
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()


import glob


import os
import ntpath
import numpy as np
import cv2
import random
import itertools
import pandas as pd
from tqdm import tqdm
import urllib
import json
import PIL.Image as Image


from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor, DefaultTrainer
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer, ColorMode
from detectron2.data import DatasetCatalog, MetadataCatalog, build_detection_test_loader
from detectron2.evaluation import COCOEvaluator, inference_on_dataset
from detectron2.structures import BoxMode


import seaborn as sns
from pylab import rcParams
import matplotlib.pyplot as plt
from matplotlib import rc


%matplotlib inline
%config InlineBackend.figure_format='retina'


sns.set(style='whitegrid', palette='muted', font_scale=1.2)


HAPPY_COLORS_PALETTE = ["#01BEFE", "#FFDD00", "#FF7D00", "#FF006D", "#ADFF02", "#8F00FF"]


sns.set_palette(sns.color_palette(HAPPY_COLORS_PALETTE))


rcParams['figure.figsize'] = 12, 8


RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)
torch.manual_seed(RANDOM_SEED)

Face Detection Data

This dataset is freely available in the public domain. It's provided by Dataturks and hosted on Kaggle: Faces labeled with bounding boxes in images. There are about 500 images, and about 1100 faces are manually labeled by bounding boxes.

I have downloaded a JSON file containing comments and uploaded it to Google Drive. Let's fetch it:

!gdown --id 1K79wJgmPTWamqb04Op2GxW0SW9oxw8KS

Let's load the file into a Pandas dataframe:

faces_df = pd.read_json('face_detection.json', lines=True)

Each row contains a single face annotation. Note that multiple rows may refer to a single image (for example, multiple faces per image).

data preprocessing

The dataset only contains image URLs and annotations. We will have to download these images. We'll also normalize annotations to make them easier to use later in Detectron2:

os.makedirs("faces", exist_ok=True)


dataset = []


for index, row in tqdm(faces_df.iterrows(), total=faces_df.shape[0]):
    img = urllib.request.urlopen(row["content"])
    img = Image.open(img)
    img = img.convert('RGB')


    image_name = f'face_{index}.jpeg'


    img.save(f'faces/{image_name}', "JPEG")


    annotations = row['annotation']
    for an in annotations:


      data = {}


      width = an['imageWidth']
      height = an['imageHeight']
      points = an['points']


      data['file_name'] = image_name
      data['width'] = width
      data['height'] = height


      data["x_min"] = int(round(points[0]["x"] * width))
      data["y_min"] = int(round(points[0]["y"] * height))
      data["x_max"] = int(round(points[1]["x"] * width))
      data["y_max"] = int(round(points[1]["y"] * height))


      data['class_name'] = 'face'


      dataset.append(data)

Let's put the data into a data frame so we can have a better look:

df = pd.DataFrame(dataset)
print(df.file_name.unique().shape[0], df.shape[0])
409 1132

We have a total of 409 images (much less than the promised 500) and 1132 annotations. Let's save them to disk (so you can reuse them):

data

Let's look at some sample annotation data. We'll use OpenCV to load an image, add a bounding box and resize it. We'll define a helper function to do all this:

def annotate_image(annotations, resize=True):
  file_name = annotations.file_name.to_numpy()[0]
  img = cv2.cvtColor(cv2.imread(f'faces/{file_name}'), cv2.COLOR_BGR2RGB)


  for i, a in annotations.iterrows():
    cv2.rectangle(img, (a.x_min, a.y_min), (a.x_max, a.y_max), (0, 255, 0), 2)


  if not resize:
    return img


  return cv2.resize(img, (384, 384), interpolation = cv2.INTER_AREA)

Let's first display some annotated images:

5cd702ae92b05f2ecba0f1a2e0cdfd68.png

f6740914dcb440fcd84269b6d24ebff5.png

These are nice images and the annotations are clearly visible. We can use torchvision to create a grid of images. Note that the images are of different sizes, so we'll resize them:

2c212409437c164de8cc81fa25a9b218.png

You can clearly see that some annotations are missing (column 4). That's data in real life, and sometimes you have to deal with it a certain way.

Face detection with Detectron 2

We will now walk through the steps to fine-tune the model with a custom dataset. But first, let's keep 5% of the data for testing:

df = pd.read_csv('annotations.csv')


IMAGES_PATH = f'faces'


unique_files = df.file_name.unique()


train_files = set(np.random.choice(unique_files, int(len(unique_files) * 0.95), replace=False))
train_df = df[df.file_name.isin(train_files)]
test_df = df[~df.file_name.isin(train_files)]

Here, the classic train-test split doesn't apply, since we want to split between filenames.

The next sections are written in a slightly generic fashion. Obviously, we have only one class - faces. However, adding more categories should be as simple as adding more annotations to the dataframe:

classes = df.class_name.unique().tolist()

Next, we'll write a function to convert our dataset to Detectron2:

def create_dataset_dicts(df, classes):
  dataset_dicts = []
  for image_id, img_name in enumerate(df.file_name.unique()):


    record = {}


    image_df = df[df.file_name == img_name]


    file_path = f'{IMAGES_PATH}/{img_name}'
    record["file_name"] = file_path
    record["image_id"] = image_id
    record["height"] = int(image_df.iloc[0].height)
    record["width"] = int(image_df.iloc[0].width)


    objs = []
    for _, row in image_df.iterrows():


      xmin = int(row.x_min)
      ymin = int(row.y_min)
      xmax = int(row.x_max)
      ymax = int(row.y_max)


      poly = [
          (xmin, ymin), (xmax, ymin),
          (xmax, ymax), (xmin, ymax)
      ]
      poly = list(itertools.chain.from_iterable(poly))


      obj = {
        "bbox": [xmin, ymin, xmax, ymax],
        "bbox_mode": BoxMode.XYXY_ABS,
        "segmentation": [poly],
        "category_id": classes.index(row.class_name),
        "iscrowd": 0
      }
      objs.append(obj)


    record["annotations"] = objs
    dataset_dicts.append(record)
  return dataset_dicts

Function of the format used: we convert each commented line into a single record with a list of comments. You may also notice that we are building a polygon with the exact same shape as the bounding box. This is required for the image segmentation model in Detectron2.

You will have to register the dataset into the datasets and metadata catalog:

for d in ["train", "val"]:
  DatasetCatalog.register("faces_" + d, lambda d=d: create_dataset_dicts(train_df if d == "train" else test_df, classes))
  MetadataCatalog.get("faces_" + d).set(thing_classes=classes)


statement_metadata = MetadataCatalog.get("faces_train")

Unfortunately, an estimator for the test set is not included by default. We can easily fix this by writing our own trainer:

class CocoTrainer(DefaultTrainer):


  @classmethod
  def build_evaluator(cls, cfg, dataset_name, output_folder=None):


    if output_folder is None:
        os.makedirs("coco_eval", exist_ok=True)
        output_folder = "coco_eval"


    return COCOEvaluator(dataset_name, cfg, False, output_folder)

If no folder is provided, evaluation results will be stored in the coco_eval folder.

Fine-tuning on a Detectron2 model is quite different from writing PyTorch code. We'll load the configuration file, change some values, and start the training process. But hey, it really helps if you know what you're doing. In this tutorial, we will use the Mask R-CNN X101-FPN model. It is pre-trained on the COCO dataset and performs very well. The downside is that it is slower to train.

Let's load the config file and pretrained model weights:

cfg = get_cfg()


cfg.merge_from_file(
  model_zoo.get_config_file(
    "COCO-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_3x.yaml"
  )
)


cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url(
  "COCO-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_3x.yaml"
)

Specify the datasets we will use for training and evaluation (we registered these datasets):

cfg.DATASETS.TRAIN = ("faces_train",)
cfg.DATASETS.TEST = ("faces_val",)
cfg.DATALOADER.NUM_WORKERS = 4

As for the optimizer, we'll do some magic to converge to some good value:

cfg.SOLVER.IMS_PER_BATCH = 4
cfg.SOLVER.BASE_LR = 0.001
cfg.SOLVER.WARMUP_ITERS = 1000
cfg.SOLVER.MAX_ITER = 1500
cfg.SOLVER.STEPS = (1000, 1500)
cfg.SOLVER.GAMMA = 0.05

Besides the standard stuff (batch size, max iterations and learning rate), we have several interesting parameters:

  • WARMUP_ITERS - the learning rate starts at 0 and gradually increases to the preset value in this number of iterations

  • STEPS - the number of times the learning rate will be decreased at its checkpoint (number of iterations)

Finally, we'll specify the number of classes and the epoch we'll evaluate on the test set:

cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 64
cfg.MODEL.ROI_HEADS.NUM_CLASSES = len(classes)


cfg.TEST.EVAL_PERIOD = 500

Time to start training, using our custom trainer:

os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)


trainer = CocoTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()

Evaluate object detection models

Evaluating an object detection model is a bit different than evaluating a standard classification or regression model. The main metric you need to know is IoU (Intersection over Union). It measures the degree of overlap between two boundaries - predicted and true. It can get values ​​between 0 and 1.

a363c6b48bd9d29d9f2e4509c5cc8eb5.png

Using IoU, a threshold (e.g. >0.5) can be defined to classify whether a prediction is a true positive (TP) or a false positive (FP). Now, you can calculate the average precision (AP) by taking the area under the precision-recall curve. Now, AP@X (eg AP50) is just the AP under some IoU threshold. This should give you a working idea of ​​how to evaluate object detection models.

I have prepared a pretrained model so I don't have to wait for the training to finish. download it:

!gdown --id 18Ev2bpdKsBaDufhVKf0cT6RmM3FjW3nL 
!mv face_detector.pth output/model_final.pth

We can start making predictions by loading the model and setting a minimum 85% confidence threshold for the predictions to be considered correct:

cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.85
predictor = DefaultPredictor(cfg)

Run the estimator with the trained model:

evaluator = COCOEvaluator("faces_val", cfg, False, output_dir="./output/")
val_loader = build_detection_test_loader(cfg, "faces_val")
inference_on_dataset(trainer.model, val_loader, evaluator)

Find faces in images

Next, let's create a folder and save all the images in the test set with prediction annotations:

os.makedirs("annotated_results", exist_ok=True)


test_image_paths = test_df.file_name.unique()
for clothing_image in test_image_paths:
  file_path = f'{IMAGES_PATH}/{clothing_image}'
  im = cv2.imread(file_path)
  outputs = predictor(im)
  v = Visualizer(
    im[:, :, ::-1],
    metadata=statement_metadata,
    scale=1.,
    instance_mode=ColorMode.IMAGE
  )
  instances = outputs["instances"].to("cpu")
  instances.remove('pred_masks')
  v = v.draw_instance_predictions(instances)
  result = v.get_image()[:, :, ::-1]
  file_name = ntpath.basename(clothing_image)
  write_res = cv2.imwrite(f'annotated_results/{file_name}', result)

eaef04c00ad6f46682d79e21682c449c.png

·  END  ·

HAPPY LIFE

c941539f67c96091161bb2db5bd893b7.png

This article is only for learning and communication, if there is any infringement, please contact the author to delete

Guess you like

Origin blog.csdn.net/weixin_38739735/article/details/132353452