一、背景介绍

这篇文章主要参考我的上一篇文章：深度学习（一）——deepNN模型实现摄像头实时识别人脸表情（C++和python3.6混合编程）。由于上一篇文章的模型所采用的数据集为fer2013，前面也介绍过这个基于这个数据集的模型识别人脸表情的准确率大概在70%左右，而fer2013数据集仅仅包含了7类常见的表情，无法对于更多样的表情进行识别。因此本文针对这个问题，自己采集人脸表情数据，采集人脸数据的方法可以参考我的文章：从零开始制作人脸表情的数据集，采集好数据集之后进行模型训练，实现夸张人脸表情的实时监测，用的模型还是deepNN。关于整个文件的结构，可以直接参考文章最后面。

主要参考：

[1]深度学习（一）——deepNN模型实现摄像头实时识别人脸表情（C++和python3.6混合编程）

[2]从零开始制作人脸表情的数据集

二、数据集准备

1. haarcascade_frontalface_default.xml文件

这里跟之前文章的思路是一样的，仍然用到了haarcascade_frontalface_default.xml文件。如何获取该文件，我在上一篇文章中有详细说明，这里不再过多介绍。

2.夸张人脸表情数据集

首先设计10类夸张人脸表情，我这里取了吃惊，大哭，高兴，撅嘴，皱眉，抬头，低头，向左看，向右看，忧郁这10类表情。下面的关键是获取这10类表情的数据。具体关于表情数据集的获取及制作，可以参考：从零开始制作人脸表情的数据集。这里需要注意的是，向左看和向右看的图像不能进行镜像处理！！！

自动制作好表情之后，仍需要自己进行简单的手动处理，主要去除一些明显的非人脸图像。之后尽量保证每个每类的照片数量近似相等，这里我选择每类都有表情图像100张。

三、模型实现

1.制作数据标签

因为模型是以分类的思想来做的，因此我们需要对每一类的表情数据打上标签（label）。我的想法是，在每个文件夹下读取相应的图片，然后将其路径和标签一起保存在一个txt文本中。这里先给出代码：

# 生成图像及标签文件https://blog.csdn.net/u010682375/article/details/77746489
import os

def generate(dir,label):
    files = os.listdir(dir)
    files.sort()
    print('start...')

    listText = open(dir + '\\' + 'zzz_list.txt', 'w')
    for file in files:
        fileType = os.path.split(file)
        if fileType[1] == '.txt':
            continue
        name = file + ' ' + str(int(label)) + '\n'
        listText.write(dir + name)
    listText.close()
    print('down!')


if __name__ == '__main__':
    generate('data/chijing/', 0)
    generate('data/daku/', 1)
    generate('data/gaoxing/', 2)
    generate('data/juezui/', 3)
    generate('data/zhoumei/', 4)
    generate('data/taitou/', 5)
    generate('data/ditou/', 6)
    generate('data/xiangzuokan/', 7)
    generate('data/xiangyoukan/', 8)
    generate('data/youyu/', 9)

一共有10种表情，所以自然有10种label，且label的编号从0~9。编写好上述程序之后执行程序，在每个表情数据文件夹下面都会生成一个txt文档，以吃惊表情为例，在'data/chijing/'路径下，找到zzz_list.txt文件，打开即可看到：

里面记录了所有吃惊表情的图片路径及标签。

接下来我们需要手动做的是，将这10类表情的txt文件汇总成一个txt文件，放在目录'data/'路径下，并命名为list.txt，即将所有的图像和标签制作完毕。

2.批量读取数据

做好数据集和标签之后，接下来是编写数据读取函数。这个函数的主要功能就是，输入list.txt文件，它能够自动提取txt里面的所有图片及其相对应的标签。下面先直接给出代码：

import numpy as np
from PIL import Image

def load_data(txt_dir):

    fopen = open(txt_dir, 'r')
    lines = fopen.read().splitlines()   # 逐行读取txt
    count = len(open(txt_dir, 'rU').readlines())      # 计算txt有多少行

    data_set = np.empty((count, 128, 128, 1), dtype="float32")
    label = np.zeros((count, 10), dtype="uint8")

    i = 0
    for line in lines:

        line = line.split(" ")          # 利用空格进行分割

        img = Image.open(line[0])
        print(i, img.size)
        # img = skimage.io.image(line[0])
        label[i, int(line[1])] = 1

        img = img.convert('L')          # 转灰度图像
        array = np.asarray(img, dtype="float32")
        data_set[i, :, :, 0] = array

        i += 1

    return data_set, label


if __name__ == '__main__':
    txt_dir = 'data/list.txt'
    data_set, label = load_data(txt_dir)
    print(data_set.shape)
    print(label.shape)

编写完上述代码可以直接运行，如果代码和txt文件没问题的话，那最终会输出data和label的维度。

3.训练模型

准备好了数据之后，接下来则是训练模型。下面先给出训练模型的代码：

import os
import tensorflow as tf
import numpy as np
from read_data import load_data


EMOTIONS = ['chijing', 'daku', 'gaoxing', 'juezui', 'zhoumei',
            'taitou', 'ditou', 'xiangzuokan', 'xiangyoukan', 'youyu']

def deepnn(x):
    x_image = tf.reshape(x, [-1, 128, 128, 1])

    # conv1
    w_conv1 = weight_variables([5, 5, 1, 64])
    b_conv1 = bias_variable([64])
    h_conv1 = tf.nn.relu(conv2d(x_image, w_conv1) + b_conv1)
    # pool1
    h_pool1 = maxpool(h_conv1)
    # norm1
    norm1 = tf.nn.lrn(h_pool1, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75)

    # conv2
    w_conv2 = weight_variables([3, 3, 64, 64])
    b_conv2 = bias_variable([64])
    h_conv2 = tf.nn.relu(conv2d(norm1, w_conv2) + b_conv2)
    norm2 = tf.nn.lrn(h_conv2, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75)
    h_pool2 = maxpool(norm2)

    # Fully connected layer
    w_fc1 = weight_variables([32 * 32 * 64, 384])
    b_fc1 = bias_variable([384])
    h_conv3_flat = tf.reshape(h_pool2, [-1, 32 * 32 * 64])
    h_fc1 = tf.nn.relu(tf.matmul(h_conv3_flat, w_fc1) + b_fc1)

    # Fully connected layer
    w_fc2 = weight_variables([384, 192])
    b_fc2 = bias_variable([192])
    h_fc2 = tf.matmul(h_fc1, w_fc2) + b_fc2

    # linear
    w_fc3 = weight_variables([192, 10])         # 一共10类
    b_fc3 = bias_variable([10])                 # 一共10类
    y_conv = tf.add(tf.matmul(h_fc2, w_fc3), b_fc3)

    return y_conv


def weight_variables(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)


def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)


def conv2d(x, w):
    return tf.nn.conv2d(x, w, strides=[1, 1, 1, 1], padding='SAME')


def maxpool(x):
    return tf.nn.max_pool(x, ksize=[1, 3, 3, 1],
                            strides=[1, 2, 2, 1], padding='SAME')


def train_model():

    # 构建模型----------------------------------------------------------
    x = tf.placeholder(tf.float32, [None, 16384])
    y_ = tf.placeholder(tf.float32, [None, 10])

    y_conv = deepnn(x)

    cross_entropy = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))
    train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
    correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

    # 构建完毕----------------------------------------------------------

    # 读取数据
    data_set, label = load_data('./data/list.txt')
    max_train_epochs = 30001
    batch_size = 100

    if not os.path.exists('./models/emotion_model'):
        os.makedirs('./models/emotion_model')

    with tf.Session() as sess:
        saver = tf.train.Saver()
        sess.run(tf.global_variables_initializer())

        batch_num = int(data_set.shape[0] / batch_size)

        for i in range(max_train_epochs):
            for j in range(batch_num):
                train_image = data_set[j * batch_size:j * batch_size + batch_size]
                train_image = train_image.reshape(-1, 128*128)
                train_label = label[j * batch_size:j * batch_size + batch_size]
                train_label = np.reshape(train_label, [-1, 10])

                train_step.run(feed_dict={x: train_image, y_: train_label})

            if i % 1 == 0:
                train_accuracy = accuracy.eval(feed_dict={
                    x: train_image, y_: train_label})
                print('epoch %d, training accuracy %f' % (i, train_accuracy))

            if i % 50 == 0:
                saver.save(sess, './models/emotion_model', global_step=i + 1)


if __name__ == '__main__':
    train_model()

编写训练模型代码的思路很简单：首先是编写deepNN模型结构，其次是在train函数中编写网络结构及相关参数，然后读取训练数据传入模型，进行训练并保存训练结果即可。编写好了之后直接运行。模型每训练50个epoch会保存一次，模型保存的路径为'./models/emotion_model'。

4.模型测试

训练好之后，最后一步就是模型的测试。这一步主要做的就是，加载训练好的模型，并打开摄像头，实时判断人脸表情。下面直接给出代码：

from train_model import *

EMOJI_DIR = './files/emotion/'
CASC_PATH = './haarcascade_frontalface_alt.xml'
cascade_classifier = cv2.CascadeClassifier(CASC_PATH)

def format_image(image):
    '''
    函数功能：转换图像的格式
    '''
    if len(image.shape) > 2 and image.shape[2] == 3:
        image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    # image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    faces = cascade_classifier.detectMultiScale(
        image, scaleFactor=1.3, minNeighbors=5)

    # None is no face found in image
    if not len(faces) > 0:
        return None, None

    max_are_face = faces[0]
    for face in faces:
        if face[2] * face[3] > max_are_face[2] * max_are_face[3]:
            max_are_face = face

    # face to image
    face_coor = max_are_face
    image = image[face_coor[1]:(face_coor[1] + face_coor[2]), face_coor[0]:(face_coor[0] + face_coor[3])]

    # Resize image to network size
    try:
        image = cv2.resize(image, (128, 128), interpolation=cv2.INTER_CUBIC)
    except Exception:
        print("[+} Problem during resize")
        return None, None
    return image, face_coor


def face_dect(image):
    """
    检测图像中有多少张脸
    """
    if len(image.shape) > 2 and image.shape[2] == 3:
        image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    faces = cascade_classifier.detectMultiScale(
        image, scaleFactor=1.3, minNeighbors=5)

    if not len(faces) > 0:
        return None
    max_face = faces[0]
    for face in faces:
        if face[2] * face[3] > max_face[2] * max_face[3]:
            max_face = face
    face_image = image[max_face[1]:(max_face[1] + max_face[2]), max_face[0]:(max_face[0] + max_face[3])]
    try:
        image = cv2.resize(face_image, (48, 48), interpolation=cv2.INTER_CUBIC) / 255.
    except Exception:
        print("[+} Problem during resize")
        return None
    return face_image, image


def resize_image(image, size):
    try:
        image = cv2.resize(image, size, interpolation=cv2.INTER_CUBIC) / 255.
    except Exception:
        print("+} Problem during resize")
        return None
    return image


def image_to_tensor(image):
    tensor = np.asarray(image).reshape(-1, 128*128) * 1 / 255.0
    return tensor


def demo(modelPath, showBox=False):
    # 构建模型---------------------------------------------------
    face_x = tf.placeholder(tf.float32, [None, 128*128])
    y_conv = deepnn(face_x)
    probs = tf.nn.softmax(y_conv)
    # 构建完毕---------------------------------------------------

    # 存储器
    saver = tf.train.Saver()
    ckpt = tf.train.get_checkpoint_state(modelPath)
    sess = tf.Session()

    # 加载模型
    if ckpt and ckpt.model_checkpoint_path:
        saver.restore(sess, ckpt.model_checkpoint_path)
        print('Restore model sucsses!!')

    # 加载emoji
    feelings_faces = []
    for index, emotion in enumerate(EMOTIONS):
        feelings_faces.append(cv2.imread(EMOJI_DIR + emotion + '.png', -1))

    video_captor = cv2.VideoCapture(0)

    emoji_face = []
    result = None

    while True:
        # 打开摄像头并做准备
        ret, frame = video_captor.read()
        detected_face, face_coor = format_image(frame)
        if showBox:
            if face_coor is not None:
                [x, y, w, h] = face_coor
                cv2.rectangle(frame, (x, y), (x + w, y + h), (255, 0, 0), 2)

        if cv2.waitKey(10):
            if detected_face is not None:
                # 如果存在人脸图像，则存储一张样片，并进行表情识别
                tensor = image_to_tensor(detected_face)
                # 识别人脸的情绪，并计算情绪分类的概率
                result = sess.run(probs, feed_dict={face_x: tensor})
        if result is not None:
            for index, emotion in enumerate(EMOTIONS):
                # 输出字体，内容为emotion的各个概率，颜色为绿色
                cv2.putText(frame, emotion, (10, index * 20 + 20), cv2.FONT_HERSHEY_PLAIN, 1, (0, 255, 0), 1)
                # 输出矩形框出人脸
                cv2.rectangle(frame, (130, index * 20 + 10), (130 + int(result[0][index] * 100), (index + 1) * 20 + 4),
                              (255, 0, 0), -1)
                # 输出对应的emoji_face
                emoji_face = feelings_faces[np.argmax(result[0])]
                emoji_face = cv2.resize(emoji_face, (120, 120))

            for c in range(0, 3):
                frame[300:420, 10:130, c] = emoji_face[:, :, c] * (emoji_face[:, :, 2] / 255.0) + frame[200:320, 10:130,
                                            c] * (1.0 - emoji_face[:, :, 2] / 255.0)
        cv2.imshow('face', frame)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break


def main(CHECKPOINT_DIR):
    if True:
        demo(CHECKPOINT_DIR)


if __name__ == '__main__':
    CHECKPOINT_DIR = './files/ckpt'
    main(CHECKPOINT_DIR)

执行上述代码之前，需要先将训练好的模型放到路径'./files/ckpt/'下，并准备好emoji表情（将其裁剪为120*120大小），并放到路径'./files/emotion/'下，执行上述代码，即可打开摄像头实现人脸表情的监测。我自己只训练了5000次，感觉模型的效果并不好，有一些表情目前还无法准确识别。后续我还会进行更深一步研究。

补充一下将emoji图像resize成120*120大小的图像代码：

from PIL import Image
import os

def resize_emotion(inupt_dir, output_dir):
    # 获取输入文件夹中的所有文件/夹，并改变工作空间
    files = os.listdir(inupt_dir)
    os.chdir(inupt_dir)

    # 判断输出文件夹是否存在，不存在则创建
    if (not os.path.exists(output_dir)):
        os.makedirs(output_dir)

    for file in files:
        # 判断是否为文件，文件夹不操作
        if (os.path.isfile(file)):
            img = Image.open(file)
            img = img.resize((120, 120), Image.ANTIALIAS)
            img.save(os.path.join(output_dir, file))


if __name__ == '__main__':
    inupt_dir = './files/emoji/'
    output_dir = './files/emotion/'
    resize_emotion(inupt_dir, output_dir)

四、分析总结

1.自己采集数据时一定要注意向左看和向右看的图像不能进行镜像处理。

2.模型效果目前并不好，后续我觉得可以添加更多的数据量进行训练。

3.整个文件的所有结构为：

-- get_image.py            # 爬取数据集的程序
-- img_preprocessing.py    # 人脸数据裁剪及其预处理
-- img_augument.py         # 数据增广程序
-- make_label.py           # 制作人脸标签，生成txt的程序
-- read_data.py            # 利用list.txt读取图像数据及标签的程序
-- train_model.py          # 利用deepNN训练模型
-- test.py                 # 测试程序，利用摄像头实时判断人脸表情
-- haarcascade_frontalface_alt.xml
-- files                   # 存储了训练好的模型和emoji图像
    |------ ckpt
                |------ checkpoint
                |------ emotion_model-5001.data-00000-of-00001
                |------ ......
    |------ emotion        # 存储了处理好的emoji图像
                |------ chijing.jpg
                |------ ......
-- data                    # 处理好的数据集
    |------ list.txt
    |------ chijing
                |------ img01.jpg
                |------ ......
    |------ daku
                |------ img01.jpg
                |------ ......
    |------ gaoxing
                |------ img01.jpg
                |------ ......
    |------ juezui
                |------ img01.jpg
                |------ ......
    |------ zhoumei
                |------ img01.jpg
                |------ ......
    |------ taitou
                |------ img01.jpg
                |------ ......
    |------ ditou
                |------ img01.jpg
                |------ ......
    |------ xiangzuokan
                |------ img01.jpg
                |------ ......
    |------ xiangyoukan
                |------ img01.jpg
                |------ ......
    |------ youyu
                |------ img01.jpg
                |------ ......

深度学习（二）——从零自己制作数据集到利用deepNN实现夸张人脸表情的实时监测（tensorflow实现）