概述

最近博主参加一个小比赛，需要用到目标检测从而进行固定位置的人物经检查，看某位置是否有人，和行人检测不同，大部分的人是坐着的，而且被遮挡的比较多，所以用行人检测是行不通的。于是就想到用Tensorflow自带的object_detection模块进行二次开发，实现监测固定位置是否有人。首先安装环境的搭建，可以参考这篇博客,那么如果你不想制作数据集而且想只更改一下源码来实现自己目的，那么就往下看吧。如果要自己训练，那么就移步这里

更改代码

在object_detection文件的目录下有一个非常重要的文件”object_detection_tutorial.py“,这就是核心API，使用后效果图如下：
在这里插入图片描述
如果我们只想显示人，或者所想要得事物并返回他们的位置怎么做呢？

我们去看开代码这一部分

       image = Image.open(image_path)
        # the array based representation of the image will be used later in order to prepare the
        # result image with boxes and labels on it.
        image_np = load_image_into_numpy_array(image)
        # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
        image_np_expanded = np.expand_dims(image_np, axis=0)
        image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
        # Each box represents a part of the image where a particular object was detected.
        boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
        # Each score represent how level of confidence for each of the objects.
        # Score is shown on the result image, together with the class label.
        scores = detection_graph.get_tensor_by_name('detection_scores:0')
        classes = detection_graph.get_tensor_by_name('detection_classes:0')
        num_detections = detection_graph.get_tensor_by_name('num_detections:0')
        # Actual detection.
        (boxes, scores, classes, num_detections) = sess.run(
            [boxes, scores, classes, num_detections],
            feed_dict={image_tensor: image_np_expanded})
        # Visualization of the results of a detection.
        print(classes)
        print(num_detections)
        vis_util.visualize_boxes_and_labels_on_image_array(
            image_np,
            np.squeeze(boxes),
            np.squeeze(classes).astype(np.int32),
            np.squeeze(scores),
            category_index,
            use_normalized_coordinates=True,
            line_thickness=8)
        plt.figure(figsize=IMAGE_SIZE)
        plt.imshow(image_np)
        plt.show()

通过上述代码，发现一个核心函数” vis_util.visualize_boxes_and_labels_on_image_array“在同级目录”utils“文件夹下的"visualization_utils.py"里；
下面是代码

def visualize_boxes_and_labels_on_image_array(
    image,
    boxes,
    classes,
    scores,
    category_index,
    instance_masks=None,
    instance_boundaries=None,
    keypoints=None,
    use_normalized_coordinates=False,
    max_boxes_to_draw=20,
    min_score_thresh=.5,
    agnostic_mode=False,
    line_thickness=4,
    groundtruth_box_visualization_color='black',
    skip_scores=False,
    skip_labels=False):
  
  box_to_display_str_map = collections.defaultdict(list)
  box_to_color_map = collections.defaultdict(str)
  box_to_instance_masks_map = {}
  box_to_instance_boundaries_map = {}
  box_to_keypoints_map = collections.defaultdict(list)
  if not max_boxes_to_draw:
    max_boxes_to_draw = boxes.shape[0]
  for i in range(min(max_boxes_to_draw, boxes.shape[0])):
    if scores is None or scores[i] > min_score_thresh:
      box = tuple(boxes[i].tolist())
      if instance_masks is not None:
        box_to_instance_masks_map[box] = instance_masks[i]
      if instance_boundaries is not None:
        box_to_instance_boundaries_map[box] = instance_boundaries[i]
      if keypoints is not None:
        box_to_keypoints_map[box].extend(keypoints[i])
      if scores is None:
        box_to_color_map[box] = groundtruth_box_visualization_color
      else:
        display_str = ''
        if not skip_labels:
          if not agnostic_mode:
            if classes[i] in category_index.keys():
              class_name = category_index[classes[i]]['name']
            else:
              class_name = 'N/A'
            display_str = str(class_name)
        if not skip_scores:
          if not display_str:
            display_str = '{}%'.format(int(100*scores[i]))
          else:
            display_str = '{}: {}%'.format(display_str, int(100*scores[i]))
        box_to_display_str_map[box].append(display_str)
        if agnostic_mode:
          box_to_color_map[box] = 'DarkOrange'
        else:
          box_to_color_map[box] = STANDARD_COLORS[
              classes[i] % len(STANDARD_COLORS)]

  # Draw all boxes onto image.
  for box, color in box_to_color_map.items():
    ymin, xmin, ymax, xmax = box
    if instance_masks is not None:
      draw_mask_on_image_array(
          image,
          box_to_instance_masks_map[box],
          color=color
      )
    if instance_boundaries is not None:
      draw_mask_on_image_array(
          image,
          box_to_instance_boundaries_map[box],
          color='red',
          alpha=1.0
      )
    draw_bounding_box_on_image_array(
        image,
        ymin,
        xmin,
        ymax,
        xmax,
        color=color,
        thickness=line_thickness,
        display_str_list=box_to_display_str_map[box],
        use_normalized_coordinates=use_normalized_coordinates)
    if keypoints is not None:
      draw_keypoints_on_image_array(
          image,
          box_to_keypoints_map[box],
          color=color,
          radius=line_thickness / 2,
          use_normalized_coordinates=use_normalized_coordinates)

  return image

这是给图片画框图的代码，其中category_index是关于识别类别的，通过追根发现它的源头在data文件下的mscoco_label_map.pbtxt下，这是前三个。

tem {
  name: "/m/01g317"
  id: 1
  display_name: "person"
}
item {
  name: "/m/0199g"
  id: 2
  display_name: "bicycle"
}
item {
  name: "/m/0k4j"
  id: 3
  display_name: "car"
}

而box是框出的坐标，class是对应box的类别，score是class与box匹配度的评分。多么好了，我们稍微做改变，就可以把我们需要的信息留下，去掉我们不需要的，即加一个判断语句即可，更改如下，只显示自己指定的person

def visualize_boxes_and_labels_on_image_array(
    image,
    boxes,
    classes,
    scores,
    category_index,
    instance_masks=None,
    instance_boundaries=None,
    keypoints=None,
    use_normalized_coordinates=False,
    max_boxes_to_draw=20,
    min_score_thresh=.4,
    agnostic_mode=False,
    line_thickness=4,
    groundtruth_box_visualization_color='black',
    skip_scores=True,
    skip_labels=False):

  # Create a display string (and color) for every box location, group any boxes
  # that correspond to the same location.
  box_to_display_str_map = collections.defaultdict(list)
  box_to_color_map = collections.defaultdict(str)
  box_to_instance_masks_map = {}
  box_to_instance_boundaries_map = {}
  box_to_persons = np.array([])
  box_to_keypoints_map = collections.defaultdict(list)
  # print(category_index)
  if not max_boxes_to_draw:
    max_boxes_to_draw = boxes.shape[0]
  for i in range(min(max_boxes_to_draw, boxes.shape[0])):
    if scores is None or scores[i] > min_score_thresh:
        if classes[i] in category_index.keys():
          class_name = category_index[classes[i]]['name']
          # print(class_name)
          if class_name == "person":
              box = tuple(boxes[i].tolist())
              if box_to_persons.shape[0]==0:
                  box_to_persons = boxes[i]
              else:
                  box_to_persons = np.vstack((box_to_persons, boxes[i]))
              if instance_masks is not None:
                box_to_instance_masks_map[box] = instance_masks[i]
              if instance_boundaries is not None:
                box_to_instance_boundaries_map[box] = instance_boundaries[i]
              if keypoints is not None:
                box_to_keypoints_map[box].extend(keypoints[i])
              if scores is None:
                box_to_color_map[box] = groundtruth_box_visualization_color
              else:
                display_str = ''
                if not skip_labels:
                  if not agnostic_mode:
                    if classes[i] in category_index.keys():pass
                    else:
                      class_name = 'N/A'
                    display_str = str(class_name)
                if not skip_scores:
                  if not display_str:
                    display_str = '{}%'.format(int(100*scores[i]))
                  else:
                    display_str = '{}: {}%'.format(display_str, int(100*scores[i]))
                box_to_display_str_map[box].append(display_str)
                if agnostic_mode:
                  box_to_color_map[box] = 'DarkOrange'
                else:
                  box_to_color_map[box] = STANDARD_COLORS[
                      classes[i] % len(STANDARD_COLORS)]

  # Draw all boxes onto image.
  for box, color in box_to_color_map.items():
    ymin, xmin, ymax, xmax = box
    if instance_masks is not None:
      draw_mask_on_image_array(
          image,
          box_to_instance_masks_map[box],
          color=color
      )
    if instance_boundaries is not None:
      draw_mask_on_image_array(
          image,
          box_to_instance_boundaries_map[box],
          color='red',
          alpha=1.0
      )
    draw_bounding_box_on_image_array(
        image,
        ymin,
        xmin,
        ymax,
        xmax,
        color=color,
        thickness=line_thickness,
        display_str_list=box_to_display_str_map[box],
        use_normalized_coordinates=use_normalized_coordinates)
    if keypoints is not None:
      draw_keypoints_on_image_array(
          image,
          box_to_keypoints_map[box],
          color=color,
          radius=line_thickness / 2,
          use_normalized_coordinates=use_normalized_coordinates)

  return image,box_to_persons

然后精简一下：

def visualize_boxes_and_labels_on_image_array(
    image,
    boxes,
    classes,
    scores,
    category_index,
    use_normalized_coordinates=False,
    max_boxes_to_draw=20,
    min_score_thresh=.4,
    line_thickness=1,
    groundtruth_box_visualization_color='black',
    ):
  box_to_display_str_map = collections.defaultdict(list)
  box_to_color_map = collections.defaultdict(str)

  box_to_persons = np.array([])
  if not max_boxes_to_draw:
    max_boxes_to_draw = boxes.shape[0]
  for i in range(min(max_boxes_to_draw, boxes.shape[0])):
    if scores is None or scores[i] > min_score_thresh:
        if classes[i] in category_index.keys():
          class_name = category_index[classes[i]]['name']
          if class_name == "person":
              box = tuple(boxes[i].tolist())
              if box_to_persons.shape[0]==0:
                  box_to_persons = boxes[i]
              else:
                  box_to_persons = np.vstack((box_to_persons, boxes[i]))
              box_to_color_map[box] = groundtruth_box_visualization_color
              box_to_color_map[box] = STANDARD_COLORS[
                      classes[i] % len(STANDARD_COLORS)]

  # Draw all boxes onto image.
  for box, color in box_to_color_map.items():
    ymin, xmin, ymax, xmax = box
    draw_bounding_box_on_image_array(
        image,
        ymin,
        xmin,
        ymax,
        xmax,
        color=color,
        thickness=line_thickness,
        display_str_list=box_to_display_str_map[box],
        use_normalized_coordinates=use_normalized_coordinates)

  return image,box_to_persons

这样就实现了只框出你想要的单一种类的效果，并返回其识别位置

在这里插入图片描述
至于想在摄像中识别，只需要更改object_detection_tutorial.py文件下了，直接上代码：

import cv2
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image
import cv2
from apscheduler.schedulers.background import BackgroundScheduler
import time
## This is needed to display the images.
#%matplotlib inline

# This is needed since the notebook is stored in the object_detection folder.
sys.path.append("..")
from utils import label_map_util
from utils import visualization_utils as vis_util

# What model to download.
# MODEL_NAME = 'faster_rcnn_resnet50_coco_2018_01_28'
# MODEL_NAME ="faster_rcnn_inception_resnet_v2_atrous_coco_11_06_2017"
# MODEL_NAME = 'mask_rcnn_resnet50_atrous_coco_2018_01_28'
MODEL_NAME = 'mask_rcnn_resnet101_atrous_coco_2018_01_28'

MODEL_FILE = MODEL_NAME + '.tar.gz'
DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/'

# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'

# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('data', 'mscoco_label_map.pbtxt')

NUM_CLASSES = 90
IMAGE_SIZE = (12, 8)
#download model
opener = urllib.request.URLopener()
#下载模型，如果已经下载好了下面这句代码可以注释掉
# opener.retrieve(DOWNLOAD_BASE + MODEL_FILE, MODEL_FILE)
tar_file = tarfile.open(MODEL_FILE)
for file in tar_file.getmembers():
  file_name = os.path.basename(file.name)
  if 'frozen_inference_graph.pb' in file_name:
    tar_file.extract(file, os.getcwd())

#Load a (frozen) Tensorflow model into memory.
detection_graph = tf.Graph()
with detection_graph.as_default():
  od_graph_def = tf.GraphDef()
  with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
    serialized_graph = fid.read()
    od_graph_def.ParseFromString(serialized_graph)
    tf.import_graph_def(od_graph_def, name='')
#Loading label map
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)
print(category_index)
#Helper code
def load_image_into_numpy_array(image):
  (im_width, im_height) = image.size
  print(image.size)
  return np.array(image.getdata()).reshape(
      (im_height, im_width, 3)).astype(np.uint8)



with detection_graph.as_default():
  with tf.Session(graph=detection_graph) as sess:
    # Definite input and output Tensors for detection_graph
    image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
    # Each box represents a part of the image where a particular object was detected.
    detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
    # Each score represent how level of confidence for each of the objects.
    # Score is shown on the result image, together with the class label.
    detection_scores = detection_graph.get_tensor_by_name('detection_scores:0')
    detection_classes = detection_graph.get_tensor_by_name('detection_classes:0')
    num_detections = detection_graph.get_tensor_by_name('num_detections:0')
    cap = cv2.VideoCapture("rtsp://admin:[email protected]")
    ret = 1
    i = 0
    while ret:
        i+=1
        ret,image_np = cap.read()
        if(i%30==0):#值越大，视频越卡顿
            i=0
            # cv2.imshow("frame", image_np)
            image_np_expanded = np.expand_dims(image_np, axis=0)
            image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
            # Each box represents a part of the image where a particular object was detected.
            boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
            # Each score represent how level of confidence for each of the objects.
            # Score is shown on the result image, together with the class label.
            scores = detection_graph.get_tensor_by_name('detection_scores:0')
            classes = detection_graph.get_tensor_by_name('detection_classes:0')
            num_detections = detection_graph.get_tensor_by_name('num_detections:0')
            # Actual detection.
            (boxes, scores, classes, num_detections) = sess.run(
              [boxes, scores, classes, num_detections],
              feed_dict={image_tensor: image_np_expanded})
            # Visualization of the results of a detection.
            image, person = vis_util.visualize_boxes_and_labels_on_image_array(
              image_np,
              np.squeeze(boxes),  # 去掉一个维度
              np.squeeze(classes).astype(np.int32),
              np.squeeze(scores),
              category_index,
              use_normalized_coordinates=True,
              line_thickness=8)
            print(person.shape[0])
            print(person)
            cv2.imshow("frame", image_np)
            if cv2.waitKey(25) & 0xFF == ord('q'):
                break
            # time.sleep(2)
            # When everything done, release the capture
    cap.release()
    cv2.destroyAllWindows()

希望你阅读到精简代码和更改代码的流程后能够对源码有跟进一步的了解。

格拉迪沃

发布了25 篇原创文章 · 获赞 41 · 访问量 1万+

私信关注

使用tensorflow做特定目标检测

使用tensorflow做目标检测

概述

更改代码

猜你喜欢