TensorFlow利用InceptionV3训练新的图像分类模型

1. tensorflow编译、安装

#CPU
bazel build -c opt //tensorflow/tools/pip_package:build_pip_package
#GPU
bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
sudo pip install /tmp/tensorflow_pkg/tensorflow-*.whl

问题1：

ImportError: libcudart.so.7.0: cannot open shared object file: No such file or directory。

请在.profile文件中中添加如下：

export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64"
export CUDA_HOME=/usr/local/cuda

2.利用Tensorflow训练图像分类的模型

谷歌的Inceptionv3给出了保存好的模型和训练的代码，直接可以拿来训练。
这里采用迁移学习的方法。即前面的层的参数都不变，而只训练最后一层的方法。最后一层是一个softmax分类器，这个分类器在原来的网络上是1000个输出节点（ImageNet有1000个类），所以需要删除网络的最后的一层，变为所需要的输出节点数量，然后再进行训练。

Tensorflow中采用的方法是这样的：将自己的训练集中的每张图像输入网络，最后在瓶颈层（bottleneck），就是倒数第二层，会生成一个2048维度的特征向量，将这个特征保存在一个txt文件中，再用这个特征来训练softmax分类器。
具体的方法如下：

1.编译和预处理

bazel buildtensorflow/examples/image_retraining:retrain

如果电脑比较新，建议用这个命令来编译：

bazel build -c opt --copt=-mavx tensorflow/examples/image_retraining:retrain

后面一种在提取bottleneck特征的时候比前面的一种快了10倍左右。

编译完成后就可以使用了。但是建议还是改动一下tensorflow/examples/image_retraining目录下的retrain的python脚本，因为里面的默认路径是/tmp，这个文件夹一旦电脑关机所有数据都会清除。建议把里面所有的这个路径都改为另外的路径。之后在该路径下将自己的训练数据集放好。

训练数据集是有格式要求的：
a.数据集应该这样设置，训练集文件夹下放置多个子文件夹，每个子文件夹就是一个类，里面包含该类的所有图像。
b.图像应该是jpg或者jpeg格式。

2.训练

在设置好数据集后，运行

bazel-bin/tensorflow/examples/image_retraining/retrain --image_dir ~/XXX

image_dir ~/XXX是训练数据集的路径XXX是数据集的名称。

这时就开始训练了。训练过程中会首先下载原来的Inception网络，保存在ImageNet的文件夹下。
这里写图片描述
第一个文件是网络的图结构，第二个文件是一个测试图像，第三个是一个映射，从最后1000个节点中的每一个映射到一个编码，第四个也是一个映射，是从编码映射到人能够识别的名词，例如：节点的表示是第234个节点，而这个节点映射到的编码是nb20003，这个编码映射到的名词是熊猫（仅举例，数字和编码以及名词是随意假设的）。这些映射关系和编码是在ImageNet 2012测试集中定义的。

下载后开始提取每张训练图像的bottleneck特征。这个过程大概1s提取5张图像。在提取完成后就开始训练。
打开image_dir路径，可以在下面发现多处了两个文件，分别是output.pb和output.txt。第一个文件是训练后的图结构，第二个是从节点到名词的映射，这里不会给出中间的编码映射——除非您自己定义一个映射关系。

接下来怎么利用训练好的模型来进行分类呢？首先还是回到下载下来的ImageNet文件夹中，运行tensorflow/model/ImageNet中的classify.py，则是对文件夹中的测试图像进性预测的结果。

这时，将训练出的output_graph文件放到该文件夹下，替换掉原有的图文件（可以把output_graph文件重命名为原来的图文件名，这样就不需要改动代码了）。再运行classfy.py文件就可以用自己的模型来对图像进行分类了。给出的结果是Top5对应的节点数以及相应的概率。如果需要输出名词，需要自己定义映射关系。

但是这样还是只会对一张图像进行分类，我改动了classify_image的脚本，让它可以对多个文件分类。如果您想自己尝试改写脚本让它可以对多个文件分类，那可能会遇到这样的情况：大概在预测10几个文件后，突然报错，说图的结构不能大于2G，这是因为每训练一个图，就会在图中增加一个点，当增加到一个程度，图的结构就会超过2G。这需要在每训练一个图片后重置图，改写的脚本中已经克服了这个问题，但是这也使得分类的速度变慢，大概1.5s一张图像。这个脚本会输出top-2的准确率，如下：

修改后的classfy.py代码：

# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

"""Simple image classification with Inception.

Run image classification with Inception trained on ImageNet 2012 Challenge data
set.

This program creates a graph from a saved GraphDef protocol buffer,
and runs inference on an input JPEG image. It outputs human readable
strings of the top 5 predictions along with their probabilities.

Change the --image_file argument to any jpg image to compute a
classification of that image.

Please see the tutorial and website for a detailed description of how
to use this script to perform image recognition.

https://tensorflow.org/tutorials/image_recognition/
"""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os.path
import re
import sys
import tarfile
import os

import numpy as np
from six.moves import urllib
import tensorflow as tf

FLAGS = tf.app.flags.FLAGS

# classify_image_graph_def.pb:
#   Binary representation of the GraphDef protocol buffer.
# imagenet_synset_to_human_label_map.txt:
#   Map from synset ID to a human readable string.
# imagenet_2012_challenge_label_map_proto.pbtxt:
#   Text representation of a protocol buffer mapping a label to synset ID.
tf.app.flags.DEFINE_string(
    'model_dir', '/tmp/imagenet',
    """Path to classify_image_graph_def.pb, """
    """imagenet_synset_to_human_label_map.txt, and """
    """imagenet_2012_challenge_label_map_proto.pbtxt.""")
tf.app.flags.DEFINE_string('image_file', '',
                           """Absolute path to image file.""")
tf.app.flags.DEFINE_integer('num_top_predictions', 2,
                            """Display this many predictions.""")

# pylint: disable=line-too-long
DATA_URL = 'http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz'
# pylint: enable=line-too-long


class NodeLookup(object):
  """Converts integer node ID's to human readable labels."""

  def __init__(self,
               label_lookup_path=None,
               uid_lookup_path=None):
    if not label_lookup_path:
      label_lookup_path = os.path.join(
          FLAGS.model_dir, 'imagenet_2012_challenge_label_map_proto.pbtxt')
    if not uid_lookup_path:
      uid_lookup_path = os.path.join(
          FLAGS.model_dir, 'imagenet_synset_to_human_label_map.txt')
    self.node_lookup = self.load(label_lookup_path, uid_lookup_path)

  def load(self, label_lookup_path, uid_lookup_path):
    """Loads a human readable English name for each softmax node.

    Args:
      label_lookup_path: string UID to integer node ID.
      uid_lookup_path: string UID to human-readable string.

    Returns:
      dict from integer node ID to human-readable string.
    """
    if not tf.gfile.Exists(uid_lookup_path):
      tf.logging.fatal('File does not exist %s', uid_lookup_path)
    if not tf.gfile.Exists(label_lookup_path):
      tf.logging.fatal('File does not exist %s', label_lookup_path)

    # Loads mapping from string UID to human-readable string
    proto_as_ascii_lines = tf.gfile.GFile(uid_lookup_path).readlines()
    uid_to_human = {}
    p = re.compile(r'[n\d]*[ \S,]*')
    for line in proto_as_ascii_lines:
      parsed_items = p.findall(line)
      uid = parsed_items[0]
      human_string = parsed_items[2]
      uid_to_human[uid] = human_string

    # Loads mapping from string UID to integer node ID.
    node_id_to_uid = {}
    proto_as_ascii = tf.gfile.GFile(label_lookup_path).readlines()
    for line in proto_as_ascii:
      if line.startswith('  target_class:'):
        target_class = int(line.split(': ')[1])
      if line.startswith('  target_class_string:'):
        target_class_string = line.split(': ')[1]
        node_id_to_uid[target_class] = target_class_string[1:-2]

    # Loads the final mapping of integer node ID to human-readable string
    node_id_to_name = {}
    for key, val in node_id_to_uid.items():
      if val not in uid_to_human:
        tf.logging.fatal('Failed to locate: %s', val)
      name = uid_to_human[val]
      node_id_to_name[key] = name

    return node_id_to_name

  def id_to_string(self, node_id):
    if node_id not in self.node_lookup:
      return ''
    return self.node_lookup[node_id]


def create_graph():
  """Creates a graph from saved GraphDef file and returns a saver."""
  # Creates graph from saved graph_def.pb.
  with tf.gfile.FastGFile(os.path.join(
      FLAGS.model_dir, 'classify_image_graph_def.pb'), 'rb') as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())
    _ = tf.import_graph_def(graph_def, name='')


def run_inference_on_image(image):
  """Runs inference on an image.

  Args:
    image: Image file name.

  Returns:
    Nothing
  """
  if not tf.gfile.Exists(image):
    tf.logging.fatal('File does not exist %s', image)
  image_data = tf.gfile.FastGFile(image, 'rb').read()

  # Creates graph from saved GraphDef.
  create_graph()

  with tf.Session() as sess:
    # Some useful tensors:
    # 'softmax:0': A tensor containing the normalized prediction across
    #   1000 labels.
    # 'pool_3:0': A tensor containing the next-to-last layer containing 2048
    #   float description of the image.
    # 'DecodeJpeg/contents:0': A tensor containing a string providing JPEG
    #   encoding of the image.
    # Runs the softmax tensor by feeding the image_data as input to the graph.
    softmax_tensor = sess.graph.get_tensor_by_name('final_result:0')
    predictions = sess.run(softmax_tensor,
                           {'DecodeJpeg/contents:0': image_data})
    predictions = np.squeeze(predictions)

    # Creates node ID --> English string lookup.
    node_lookup = NodeLookup()

    top_k = predictions.argsort()[-FLAGS.num_top_predictions:][::-1]
    for node_id in top_k:
      score = predictions[node_id]
      print(node_id,end= '\t')
      print(score,end= '\t')
    print()

def main(_):
  for root, dirs, files in os.walk("/tmp/imagenet/picture", topdown=False):
    for name in files:
      print(name,end= '\t')
      image = (FLAGS.image_file if FLAGS.image_file else
                 os.path.join('/tmp/imagenet/picture',name))
      with tf.Graph().as_default():
         run_inference_on_image(image) 
         os.remove(image)

if __name__ == '__main__':
  tf.app.run()

附：Inception模型介绍