基于yolo提取车和人的检测与图像裁剪

这次换一个严谨的写作风格。因为之前在做yolo算法移植到ros上面，已经成功了，但是实验室老师又出了做姿势识别的幺蛾子。我真是一万个。。。。无语了。感觉，对深度学习只有一个简单的了解，现在还在读很多有关的论文，一边看cs231n的课程，在自学着入门。但是奈何自己的实战能力太差，实战经验太少了，只是学过那一年的c，很多关于指针的操作什么的，真的太差了，现在还在教一个小孩c语言，我都不好意思教人家了，决定从这星期开始，每天刷一些c的基础知识题，真的得好好补补了。

回到正题，环境仍然是ubuntu16.04+cuda9.0+NVIDIA GTX1050+OPENCV3.4.1，首先解决第一个问题，在yolo的基础上，提取人和车，其他的标签过滤掉。有两个解决方法，一个是自己训练车和人的训练库，另一个就是在程序中剔除出人和车以外的标签。第一个方法原来想做一下，但是看网上真正做过的，可能要训练一个星期，而且可能需要在服务器上训练，因为真的没做过，时间催的很紧，只能采取第二种方法了，等后面有时间了，一定自己训练一下，试一下，真正做过才能有话语权。那就来说一下，是怎么在程序里面实现的剔除其他标签的。首先，要明确是在，image.c里面的draw_detections这个函数里面去改动。

for(i = 0; i < num; ++i){
        char labelstr[4096] = {0};
        int class = -1;
        for(j = 0; j < classes; ++j){
            if (probs[i][j] > thresh){
           printf("probs:%f\n", probs[i][j]);
                if (class < 0) {
                    strcat(labelstr, names[j]);
                    class = j;
                } else {
                    strcat(labelstr, ", ");
                    strcat(labelstr, names[j]);
                }
                printf("%s: %.0f%%\n", names[j], probs[i][j]*100);
            }

}

在这个for循环里，num是检测到的物体个数，classes是能够检测的物体种类，这里的classes是80，说明yolov2能检测80个种类的物体，这里的for循环在筛选概率大于阈值然后输出概率的类别和大小。

int left = (b.x-b.w/2.)*im.w;
int right = (b.x+b.w/2.)*im.w;
int top = (b.y-b.h/2.)*im.h;

int bot = (b.y+b.h/2.)*im.h;

这里获得预检测区域（bounding box）的位置。下面就该添加我自己的程序。

    bool Is_person = false;
            bool Is_car = false;

        Is_person = !strcmp(labelstr, "person");
        Is_car = !strcmp(labelstr, "car");
        Is_car = !strcmp(labelstr, "truck");

        if(Is_person || Is_car)
        {

               draw_box_width(im, left, top, right, bot, width, red, green, blue);
               if (alphabet) {
                   image label = get_label(alphabet, labelstr, (im.h*.03)/10);
                   draw_label(im, top + width, left, label, rgb);
                   free_image(label);
               }//这一部分在给已经画出框的物体打label，追踪这里的draw_label就会发现他的标签是通过索引拼在一起的
               if (masks){
                   image mask = float_to_image(14, 14, 1, masks[i]);
                   image resized_mask = resize_image(mask, b.w*im.w, b.h*im.h);
                   image tmask = threshold_image(resized_mask, .5);
                   embed_image(tmask, im, left, top);
                   free_image(mask);
                   free_image(resized_mask);
                   free_image(tmask);
               }

所以，通过这段代码就可以筛选出来只是人和车的，因为车的种类有很多，但是我只选出来了car和truck这两个标签，后面再看看是否有更多的关于车的标签。

因为实验室要求做姿态识别，类似交警的指挥手势，什么手势识别出来是向左转什么是向右转。之前也查阅了一些姿态估计和骨骼识别的资料，但是感觉出来的效果来达到老师们所要求的还有点远。现在想的是，先在yolo上面把检测出来的人分割出来，在这个图像上做些其他特征的识别。比如：轮廓识别。但是这又涉及到背景是否复杂等等。所以还没有考虑到后面，现在只是，先把人从原始图像中裁剪出来。

扫描二维码关注公众号，回复： 4521239 查看本文章

因为源码对图像的处理是用的一维数组，所以要想在原图像上裁剪ROI区域我还采用的，先把一维数组转化成二维图像，然后再用opencv的函数进行图像裁剪。

这段代码在darknet_ros包上改的，因为darknet_ros包和darknet的源码还是有一点点不一样，darknet_ros包检测动态视频的时候用的就是draw_detections这个函数，而源码里面，这个函数只是在检测图片时用的这个函数。

所以在程序里面就自己仿照他的写法把源码中的image结构体中存放的一维数组转化成二维图像类在image.c里面定义：

void Image_to_iplimage(image im, IplImage* img)
{
   int x, y, k;

   image copy = copy_image(im);

   if(copy.c == 3) rgbgr_image(copy);

   int step = img->widthStep;

   for(y = 0; y < copy.h; ++y){
        for(x = 0; x < copy.w; ++x){
            for(k= 0; k < copy.c; ++k){
                img->imageData[y*step + x*copy.c + k] = (unsigned char)(get_pixel(copy,x,y,k)*255);
            }
        }
    }

}

通过这个函数就可以把一维数组类转换成二维图像类。

void draw_detections_1(image im, int num, float thresh, box *boxes, float **probs, float **masks, char **names, image **alphabet, int classes, box *target_box)
{
    int i,j;

    for(i = 0; i < num; ++i){
        char labelstr[4096] = {0};
        int class = -1;
        for(j = 0; j < classes; ++j){
            if (probs[i][j] > thresh){
           printf("probs:%f\n", probs[i][j]);
                if (class < 0) {
                    strcat(labelstr, names[j]);
                    class = j;
                } else {
                    strcat(labelstr, ", ");
                    strcat(labelstr, names[j]);
                }
                printf("%s: %.0f%%\n", names[j], probs[i][j]*100);
            }
        }

   /***这里应该找出站的最近的人进行分割，因为视野当中可能有多个人，no code***/
        if(class >= 0){
            int width = im.h * .006;

            /*
               if(0){
               width = pow(prob, 1./2.)*10+1;
               alphabet = 0;
               }
             */

            //printf("%d %s: %.0f%%\n", i, names[class], prob*100);
            int offset = class*123457 % classes;
            float red = get_color(2,offset,classes);
            float green = get_color(1,offset,classes);
            float blue = get_color(0,offset,classes);
            float rgb[3];

            //width = prob*20+2;

            rgb[0] = red;
            rgb[1] = green;
            rgb[2] = blue;
            box b = boxes[i];

            int left = (b.x-b.w/2.)*im.w;
            int right = (b.x+b.w/2.)*im.w;
            int top   = (b.y-b.h/2.)*im.h;
            int bot   = (b.y+b.h/2.)*im.h;

            if(left < 0) left = 0;
            if(right > im.w-1) right = im.w-1;
            if(top < 0) top = 0;
            if(bot > im.h-1) bot = im.h-1;

        bool Is_person = false;
            bool Is_car = false;

        Is_person = !strcmp(labelstr, "person");
        Is_car = !strcmp(labelstr, "car");
        Is_car = !strcmp(labelstr, "truck");

        if(Is_person || Is_car)
        {
       target_box->x = left*1.0;
       target_box->y = top*1.0;
                target_box->w = (right - left)*1.0;
       target_box->h = (bot - top)*1.0;

               draw_box_width(im, left, top, right, bot, width, red, green, blue);
               if (alphabet) {
                   image label = get_label(alphabet, labelstr, (im.h*.03)/10);
                   draw_label(im, top + width, left, label, rgb);
                   free_image(label);
               }
               if (masks){
                   image mask = float_to_image(14, 14, 1, masks[i]);
                   image resized_mask = resize_image(mask, b.w*im.w, b.h*im.h);
                   image tmask = threshold_image(resized_mask, .5);
                   embed_image(tmask, im, left, top);
                   free_image(mask);
                   free_image(resized_mask);
                   free_image(tmask);
               }
        }
        }
    }
}

因为我们想要得到bounding box的坐标区域，所以我们需要draw_detections这个函数把bounding box的参数传回主函数，所以要加一个参数，又因为，draw_detections这个函数用到很多地方，所以我就仿照draw_detections写了一个相同的函数，只是多了一个传回来的参数，void draw_detections_1(image im, int num, float thresh, box *boxes, float **probs, float **masks, char **names, image **alphabet, int classes, box *target_box)。

目前就先写这样，后期会陆陆续续更关于yolo算法本身的原理，只不过可能又要到后面了，最近可能还要做一些图像分割或者骨骼识别的东西。

最后的最后，还是要感谢我的王叔叔，男票大神啊，每次都是跟着他的指导，我才能完成任务，谢谢王叔叔。

基于yolo提取车和人的检测与图像裁剪

猜你喜欢