0, order
This article describes how to use the deep residual network (ResNet) in Dlib to realize real-time face recognition. The basic development environment is as follows:
Installed software | version |
---|---|
MIRACLES | 10.2.89 |
cuDNN | 8.0.0.180 |
OpenCV | 4.4.0 |
TensorFlow | 2.3.1 |
Jetpack | Jetpack 4.4.1 |
Platform | Jetson nano |
I have tried the implementation of face detection using opencv and the face_recognition module in dlib for face recognition, but the accuracy of face-recognition is not ideal, especially for Asian faces, which are easy to recognize as the same person. This article will use the deep residual network-ResNet in dlib to realize face recognition. It needs to be explained that this article does not involve the construction of deep residual networks, but uses and trained related pre-training models to implement this function.
1. Sources preparation
Download related models and parameters, dlib official website portal: http://dlib.net/files/
detector = dlib.cnn_face_detection_model_v1('mmod_human_face_detector.dat')
sp = dlib.shape_predictor('shape_predictor_68_face_landmarks.dat')
facerec = dlib.face_recognition_model_v1('dlib_face_recognition_resnet_model_v1.dat')
2、Coding
2.1. Facial data classification, save local facial feature vectors and labels
The pre-trained resnet model is used to save the feature data of the face, and the face feature data and the corresponding name tag are saved as a local file for real-time face recognition. What is the face feature vector after all? Now I am still not very clear, I just know that it can express a person's facial features.
import os
import cv2
import dlib
import numpy as np
import json
detector = dlib.cnn_face_detection_model_v1('mmod_human_face_detector.dat')
sp = dlib.shape_predictor('shape_predictor_68_face_landmarks.dat')
facerec = dlib.face_recognition_model_v1('dlib_face_recognition_resnet_model_v1.dat')
imagePATH = '/home/colin/works/face_recognition_resnet/data/'
data = np.zeros((1, 128))
lables = []
for file in os.listdir(imagePATH):
if '.jpg' in file or '.png' in file:
fileName = file
lableName = file.split('_')[0]
print('current image:', file)
print('current lable:', lableName)
img = cv2.imread(imagePATH + file)
if img.shape[0] * img.shape[1] > 500000:
img = cv2.resize(img, (0,0), fx = 0.5, fy = 0.5)
dets = detector(img, 1)
for k, d in enumerate(dets):
rec = dlib.rectangle(d.rect.left(), d.rect.top(), d.rect.right(), d.rect.bottom())
shape = sp(img, rec)
face_descriptor = facerec.compute_face_descriptor(img, shape)
faceArray = np.array(face_descriptor).reshape((1, 128))
data = np.concatenate((data, faceArray))
lables.append(lableName)
cv2.rectangle(img, (rec.left(), rec.top(), rec.right(), rec.bottom()), (0, 255, 0), 2)
cv2.waitKey(2)
cv2.imshow('img', img)
data = data[1:, :]
np.savetxt('faceData.txt', data, fmt = '%f')
lableFile = open("labels.txt", 'w')
json.dump(lables, lableFile)
lableFile.close()
cv2.destroyAllWindows()
2.2, face detection
detector = dlib.cnn_face_detection_model_v1('mmod_human_face_detector.dat')
2.3, face recognition
# 640 480 320 240
def gstreamer_pipeline(
capture_width=320,
capture_height=240,
display_width=320,
display_height=240,
framerate=30,
flip_method=0,
):
return (
"nvarguscamerasrc ! "
"video/x-raw(memory:NVMM), "
"width=(int)%d, height=(int)%d, "
"format=(string)NV12, framerate=(fraction)%d/1 ! "
"nvvidconv flip-method=%d ! "
"video/x-raw, width=(int)%d, height=(int)%d, format=(string)BGRx ! "
"videoconvert ! "
"video/x-raw, format=(string)BGR ! appsink"
% (
capture_width,
capture_height,
framerate,
flip_method,
display_width,
display_height,
)
)
def findNearestClassForImage(face_descriptor, faceLabel):
global threshold
temp = face_descriptor - data
e = np.linalg.norm(temp,axis=1,keepdims=True)
min_distance = e.min()
print('distance: ', min_distance)
if min_distance > threshold:
return 'unknow'
index = np.argmin(e)
return faceLabel[index]
def recognition(img):
dets = detector(img, 1)
for k, d in enumerate(dets):
print("Detection {}: Left: {} Top: {} Right: {} Bottom: {}".format(
k, d.rect.left(), d.rect.top(), d.rect.right(), d.rect.bottom()))
rec = dlib.rectangle(d.rect.left(),d.rect.top(),d.rect.right(),d.rect.bottom())
print(rec.left(),rec.top(),rec.right(),rec.bottom())
shape = sp(img, rec)
face_descriptor = facerec.compute_face_descriptor(img, shape)
class_pre = findNearestClassForImage(face_descriptor, label)
print(class_pre)
cv2.rectangle(img, (rec.left(), rec.top()+10), (rec.right(), rec.bottom()), (0, 255, 0), 2)
cv2.putText(img, class_pre , (rec.left(),rec.top()), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0,255,0), 2, cv2.LINE_AA)
img = image_shop.mark_add(rec.left(), rec.right(), rec.top(), rec.bottom(), img)
return img
def data_load():
global label, data, filePATH
labelFile = open(filePATH + 'labels.txt', 'r')
label = json.load(labelFile)
labelFile.close()
data = np.loadtxt(filePATH + 'faceData.txt', dtype=float)
def face_recognition_livevideo(window_name, camera_idx):
cv2.namedWindow(window_name)
#CSI Camera for get pipeline
cap = cv2.VideoCapture(gstreamer_pipeline(flip_method=camera_idx), cv2.CAP_GSTREAMER)
while cap.isOpened():
ok, frame = cap.read() #read 1 frame
if not ok:
break
resImage = recognition(frame)
#display
cv2.imshow(window_name, resImage)
c = cv2.waitKey(1)
if c & 0xFF == ord('q'):
break
#close
cap.release()
cv2.destroyAllWindows()
if __name__ == '__main__':
data_load()
face_recognition_livevideo('Find Face', 0)
2.4. Use GPU for acceleration:
You can use dlib with CUDA enabled. If dlib is not enabled, you may need to reinstall dlib. Add "-DDLIB_USE_CUDA=1" when compiling and installing. You can refer to my previous blog post about the installation of the dlib library.
3. Demo effect
The achieved effect is still good, and dlib can also call CUDA to participate in the calculation well. Thanks to the call of the GPU, the CPU will not appear too high load.
Reference appendix
1) Use the deep residual network (ResNet) in dlib to realize real-time face recognition
2) Daniel teaches you to use the deep residual network (ResNet) in dlib to realize real-time face recognition