基于Motion Vector的实时动作识别

论文：Real-time Action Recognition with Enhanced Motion Vector CNNs

Github: https://github.com/zbwglory/MV-release

2016 CVPR

论文基于双流法（deep two-stream）的基本结构，提出了使用运动向量（Motion Vector）来代替光流（optical flow）。可以获得比光流法更快的速度。但是单纯的使用运动向量替换光流法会使得整体有7%精度的下降。对于精度降低的问题，提出了基于三步法（initialization transfer, supervision transfer ,their combination）的知识蒸馏，来使用光流的teacher网络（OF-CNN）初始化，指导训练student的运动向量网络（MV-CNN）。

最终可以在UCF101数据集达到390.7FPS的速度，THUMOS14数据集达到403FPS的速度，相比传统的双流法加速27倍。

运动向量 VS 光流：

运动向量基于macro blocks输出最终结果，而光流法是基于每一个pixel输出结果。运动向量的噪声更大，粒度更粗，光流法的粒度更细。运动向量的速度快，光流的速度很慢。

论文贡献：

使用双流模型，提出了基于CNN的识别方法，并且取得了state-of-the-art的效果。
首次提出使用运动向量来替代光流
提出了知识蒸馏的方法，大大的提升了准确性。

网络结构：

并不是每一个图片都会包含运动向量。假设图片的集合为group of pictures (GOP)，那么一个GOP里面包含3种类型的帧图片，I-frame, P-frame and B-frame。I-frame是内部编码的帧，不包含运动向量。P-frame表示预测的帧，包含运动向量。B-frame表示前后预测的帧，包含运动向量。

在实际的训练中，不包含运动向量的I-frame会使得训练精度降低，为了解决这个问题，使用该帧的前面的包含运动向量的I-frame帧替代该帧的I-frame。

详细的网络结构如下，

知识蒸馏三部曲：

Teacher Initialization

Teacher网络和student网络的结构一样，使用Teacher网络的权值来初始化student网络。

Supervision Transfer

教师网络的输入为基于光流法得到的图片，学生网络的输入为基于运动向量得到的图片。两者的输入不同，但是需要达到输出一样的效果。

对教师网络的最后一个全连接层除以Temp，得到soft的输出，学生网络的最后一个全连接层也做同样的操作。最后通过softmax_crossentrop loss来使得教师网络和学生网络的最后一个全连接层的特征分布相似。

另外一个loss就是学生网络和groundtruth的softmax_crossentrop loss。

最终的loss就是上面两个loss的加权和。

Combination

将Teacher Initialization和Supervision Transfer结合起来。两个同时使用对学生网络进行训练。

实验结果：

训练中的数据增强：

随机crop，224× 224, 196× 196 , 168× 168
随机进行像素的尺度缩放scale jittering，1, 0.875, 0.75
随机水平镜像

程序指北：

CMakeLists：

cmake_minimum_required(VERSION 2.8)                                             

project(draw_flow)
set(CMAKE_CXX_FLAGS   "-std=c++11")

FIND_PACKAGE(OpenCV REQUIRED)

include_directories(${OpenCV_INCLUDE_DIRS})
include_directories("/usr/local/include/")

LINK_DIRECTORIES("/usr/local/lib")
add_executable(draw_flow draw_flow.cpp)
target_link_libraries(draw_flow ${OpenCV_LIBS})

mpegflow代码：

#include <opencv2/highgui/highgui.hpp>                                                                                                                                                                             
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/imgcodecs/imgcodecs.hpp>
#include <opencv2/opencv.hpp>


#include <stdio.h>
#include <iostream>
#include <fstream>
using namespace cv;
using namespace std;



static void convertFlowToImage(const Mat &flow_x, const Mat &flow_y, Mat &img_x, Mat &img_y,
       double lowerBound, double higherBound) {
    #define CAST(v, L, H) ((v) > (H) ? 255 : (v) < (L) ? 0 : cvRound(255*((v) - (L))/((H)-(L))))
    for (int i = 0; i < flow_x.rows; ++i) {
        for (int j = 0; j < flow_y.cols; ++j) {
            float x = flow_x.at<float>(i,j);
            float y = flow_y.at<float>(i,j);
            img_x.at<uchar>(i,j) = CAST(x, lowerBound, higherBound);
            img_y.at<uchar>(i,j) = CAST(y, lowerBound, higherBound);
        }
    }
    #undef CAST
}


int main(int argc, char** argv){
    // IO operation
    const char* keys =
        {                                                                                                                                                                                                          
            "{ f  | vidFile      | dump | filename of optical flow}"
            "{ x  | xFlowFile    | flow_x | filename of flow x component }"
            "{ y  | yFlowFile    | flow_y | filename of flow x component }"
            "{ b  | bound | 15 | specify the maximum of optical flow}"
        };



    //CommandLineParser cmd(argc, argv, keys);
    string vidFile = "dump.mvs0";//cmd.get<string>("vidFile");
    string xFlowFile = "v_ApplyEyeMakeup_g01_c01/flow_x";//cmd.get<string>("xFlowFile");
    string yFlowFile = "v_ApplyEyeMakeup_g01_c01/flow_y"; //cmd.get<string>("yFlowFile");
    //string imgFile = cmd.get<string>("imgFile");
    int bound = 20;//cmd.get<int>("bound");

    int video_width = 320;
    int video_height = 240;

    int frame_num = 0;
    Mat image, prev_image, prev_grey, grey, frame;

    ifstream fin;
    cout << vidFile << endl;
    fin.open(vidFile.data());
    if (!fin) {
        cout << "error in opening file";
        return -1;
    }


    int frame_prev = 0;
    while(!fin.eof()) {
        // Output optical flow
        int mv_per_frame = -1;
        fin >> mv_per_frame;
        if (mv_per_frame == -1)
            break;
        int forback, blockx,blocky,srcx,srcy,dstx,dsty,minx,miny;
        Mat flow_x(video_height,video_width,CV_32F,Scalar(0));
        Mat flow_y(video_height,video_width,CV_32F,Scalar(0));
        for (int i=0; i<mv_per_frame; i++) {
            fin >> frame_num >> forback >> blockx >> blocky >> srcx >> srcy >> dstx >> dsty >> minx >> miny;
            for (int x=0; x<blockx; x++) {
                for (int y=0; y<blocky; y++) {
                    if ((dstx-blockx/2+x < 0) || (dsty-blocky/2+y < 0) || (dstx-blockx/2+x > video_width-1) || (dsty-blocky/2+y > video_height-1) || (forback > 0))
                        continue;
                    flow_x.at<float>(dsty-blocky/2+y,dstx-blockx/2+x) = (float)minx;
                    flow_y.at<float>(dsty-blocky/2+y,dstx-blockx/2+x) = (float)miny;
                }
            }
        }
        frame_num = frame_num-1;

        cv::Mat imgX(flow_x.size(),CV_8UC1);
        cv::Mat imgY(flow_y.size(),CV_8UC1);
        convertFlowToImage(flow_x,flow_y, imgX, imgY, -bound, bound);
        char tmp[20];
        sprintf(tmp,"_%04d.jpg",int(frame_num));

        cv::Mat imgX_, imgY_, imgX_small, imgY_small;
        cv::resize(imgX,imgX_, cv::Size(340,256));
        cv::resize(imgY,imgY_, cv::Size(340,256));

        cv::imwrite(xFlowFile + tmp,imgX_);
        cv::imwrite(yFlowFile + tmp,imgY_);


        while (frame_prev < frame_num-1) {
            frame_prev ++ ;
            char tmp1[20];
            sprintf(tmp1,"_%04d.jpg",int(frame_prev));
            cout << tmp1 << endl;
            cv::imwrite(xFlowFile + tmp1,imgX_);
            cv::imwrite(yFlowFile + tmp1,imgY_);
        }
        frame_prev = frame_num;

    }
    return 0;
}

提取motion vector并转化为灰度图片指令extract_mvs_sample.sh：

tar xvf ffmpeg-2.7.2.tar
mkdir v_ApplyEyeMakeup_g01_c01

gcc -o ffmpeg-2.7.2/doc/examples/extract_mvs extract_mvs.c -L /usr/local/lib/ -lavcodec -lavdevice -lavfilter -lavformat –lavutil

ffmpeg-2.7.2/doc/examples/extract_mvs v_ApplyEyeMakeup_g01_c01.avi > dump.mvs0

./MV-code-release/build/draw_flow -f dump.mvs0 -x v_ApplyEyeMakeup_g01_c01/flow_x -y v_ApplyEyeMakeup_g01_c01/flow_y -b 20

dump.mvs0的 样子：

运动向量的图片形式：

watersink

发布了219 篇原创文章 · 获赞 898 · 访问量 140万+

他的留言板关注

基于Motion Vector的实时动作识别

猜你喜欢