tf30: center loss及其mnist上的应用

更多机器学习资料,公众号MachineLN,邀请您扫码关注:

你要的答案或许都在这里小鹏的博客目录

Center loss是ECCV2016中一篇论文《A Discriminative Feature Learning Approach for Deep Face Recognition》提出来的概念,主要思想就是在softmax loss基础上额外加入一个正则项,让网络中每一类样本的特征向量都能够尽量聚在一起,在mnist上的示意效果如下图所示,可以看到每一类样本都聚在类别中心的周围。直接看下图:

如果下面代码不收敛或者收敛慢,再试几次,与开始的初始化参数有关!

下面是center loss的tensorflow实现:

有ceter loss:mnist_with_center_loss.py

# coding=utf-8
import os
import numpy as np
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

slim = tf.contrib.slim
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

LAMBDA = 0.5
CENTER_LOSS_ALPHA = 0.5
NUM_CLASSES = 10

with tf.name_scope('input'):
    input_images = tf.placeholder(tf.float32, shape=(None,28,28,1), name='input_images')
    labels = tf.placeholder(tf.int64, shape=(None), name='labels')
    
global_step = tf.Variable(0, trainable=False, name='global_step')

def get_center_loss(features, labels, alpha, num_classes):
    """获取center loss及center的更新op
    
    Arguments:
        features: Tensor,表征样本特征,一般使用某个fc层的输出,shape应该为[batch_size, feature_length].
        labels: Tensor,表征样本label,非one-hot编码,shape应为[batch_size].
        alpha: 0-1之间的数字,控制样本类别中心的学习率,细节参考原文.
        num_classes: 整数,表明总共有多少个类别,网络分类输出有多少个神经元这里就取多少.
    
    Return:
        loss: Tensor,可与softmax loss相加作为总的loss进行优化.
        centers: Tensor,存储样本中心值的Tensor,仅查看样本中心存储的具体数值时有用.
        centers_update_op: op,用于更新样本中心的op,在训练时需要同时运行该op,否则样本中心不会更新
    """
    # 获取特征的维数,例如256维
    len_features = features.get_shape()[1]
    # 建立一个Variable,shape为[num_classes, len_features],用于存储整个网络的样本中心,
    # 设置trainable=False是因为样本中心不是由梯度进行更新的
    centers = tf.get_variable('centers', [num_classes, len_features], dtype=tf.float32,
        initializer=tf.constant_initializer(0), trainable=False)
    # 将label展开为一维的,输入如果已经是一维的,则该动作其实无必要
    labels = tf.reshape(labels, [-1])
    
    # 根据样本label,获取mini-batch中每一个样本对应的中心值
    centers_batch = tf.gather(centers, labels)
    # 计算loss
    loss = tf.nn.l2_loss(features - centers_batch)
    
    # 当前mini-batch的特征值与它们对应的中心值之间的差
    diff = centers_batch - features
    
    # 获取mini-batch中同一类别样本出现的次数,了解原理请参考原文公式(4)
    unique_label, unique_idx, unique_count = tf.unique_with_counts(labels)
    appear_times = tf.gather(unique_count, unique_idx)
    appear_times = tf.reshape(appear_times, [-1, 1])
    
    diff = diff / tf.cast((1 + appear_times), tf.float32)
    diff = alpha * diff
    
    centers_update_op = tf.scatter_sub(centers, labels, diff)
    
    return loss, centers, centers_update_op

def inference(input_images):
    with slim.arg_scope([slim.conv2d], weights_initializer=slim.variance_scaling_initializer(),
        activation_fn=tf.nn.relu, normalizer_fn= slim.batch_norm, kernel_size=3, padding='SAME'):
        with slim.arg_scope([slim.max_pool2d], kernel_size=2):
            
            x = slim.conv2d(input_images, num_outputs=32, scope='conv1_1')
            x = slim.conv2d(x, num_outputs=32, scope='conv1_2')
            x = slim.max_pool2d(x, scope='pool1')
     
            x = slim.conv2d(x, num_outputs=64, scope='conv2_1')
            x = slim.conv2d(x, num_outputs=64, scope='conv2_2')
            x = slim.max_pool2d(x, scope='pool2')
            
            x = slim.conv2d(x, num_outputs=128, scope='conv3_1')
            x = slim.conv2d(x, num_outputs=128, scope='conv3_2')
            x = slim.max_pool2d(x, scope='pool3')
            
            x = slim.flatten(x, scope='flatten')
            
            feature = slim.fully_connected(x, num_outputs=2, activation_fn=None, scope='fc1')
            
            x = tf.nn.relu(feature)

            x = slim.fully_connected(x, num_outputs=10, activation_fn=None, scope='fc2')
    
    return x, feature

def build_network(input_images, labels, ratio=0.5):
    logits, features = inference(input_images)
    
    with tf.name_scope('loss'):
        with tf.name_scope('center_loss'):
            center_loss, centers, centers_update_op = get_center_loss(features, labels, CENTER_LOSS_ALPHA, NUM_CLASSES)
        with tf.name_scope('softmax_loss'):
            softmax_loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits))
        with tf.name_scope('total_loss'):
            total_loss = softmax_loss + ratio * center_loss
    
    with tf.name_scope('acc'):
        accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.arg_max(logits, 1), labels), tf.float32))
    
    with tf.name_scope('loss/'):
        tf.summary.scalar('CenterLoss', center_loss)
        tf.summary.scalar('SoftmaxLoss', softmax_loss)
        tf.summary.scalar('TotalLoss', total_loss)
        
    return logits, features, total_loss, accuracy, centers_update_op

logits, features, total_loss, accuracy, centers_update_op = build_network(input_images, labels, ratio=LAMBDA)

mnist = input_data.read_data_sets('/Users/liupeng/Desktop/anaconda/center_loss/tmp/mnist', reshape=False)

optimizer = tf.train.AdamOptimizer(0.001)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
update_ops.append(centers_update_op)
with tf.control_dependencies(update_ops):
    train_op = optimizer.minimize(total_loss, global_step=global_step)

summary_op = tf.summary.merge_all()
sess = tf.Session()
sess.run(tf.global_variables_initializer())
writer = tf.summary.FileWriter('/Users/liupeng/Desktop/anaconda/center_loss/tmp/mnist_log', sess.graph)

mean_data = np.mean(mnist.train.images, axis=0)

step = sess.run(global_step)
while step <= 80000:
    batch_images, batch_labels = mnist.train.next_batch(128)
    _, summary_str, train_acc = sess.run(
        [train_op, summary_op, accuracy],
        feed_dict={
            input_images: batch_images - mean_data,
            labels: batch_labels,
        })
    step += 1
    
    writer.add_summary(summary_str, global_step=step)
    
    if step % 200 == 0:
        vali_image = mnist.validation.images - mean_data
        vali_acc = sess.run(
            accuracy,
            feed_dict={
                input_images: vali_image,
                labels: mnist.validation.labels
            })
        print(("step: {}, train_acc:{:.4f}, vali_acc:{:.4f}".
              format(step, train_acc, vali_acc)))

# 训练集
feat = sess.run(features, feed_dict={input_images:mnist.train.images[:10000]-mean_data})

# %matplotlib inline
import matplotlib.pyplot as plt

labels = mnist.train.labels[:10000]

f = plt.figure(figsize=(16,9))
c = ['#ff0000', '#ffff00', '#00ff00', '#00ffff', '#0000ff', 
     '#ff00ff', '#990000', '#999900', '#009900', '#009999']
for i in range(10):
    plt.plot(feat[labels==i,0].flatten(), feat[labels==i,1].flatten(), '.', c=c[i])
plt.legend(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'])
plt.grid()
plt.show()

# 测试集
feat = sess.run(features, feed_dict={input_images:mnist.test.images[:10000]-mean_data})

# %matplotlib inline
import matplotlib.pyplot as plt

labels = mnist.test.labels[:10000]

f = plt.figure(figsize=(16,9))
c = ['#ff0000', '#ffff00', '#00ff00', '#00ffff', '#0000ff', 
     '#ff00ff', '#990000', '#999900', '#009900', '#009999']
for i in range(10):
    plt.plot(feat[labels==i,0].flatten(), feat[labels==i,1].flatten(), '.', c=c[i])
plt.legend(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'])
plt.grid()
plt.show()

sess.close()



分别看一下训练集和测试集的结果:

训练集:  测试集:


tensorboard可视化

tensorboard --logdir tmp/mnislog



对比看一下:没有ceter loss:mnist_without_center_loss.py

# coding=utf-8
import os
import numpy as np
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

slim = tf.contrib.slim
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

LAMBDA = 0.5
CENTER_LOSS_ALPHA = 0.5
NUM_CLASSES = 10

with tf.name_scope('input'):
    input_images = tf.placeholder(tf.float32, shape=(None,28,28,1), name='input_images')
    labels = tf.placeholder(tf.int64, shape=(None), name='labels')
    
global_step = tf.Variable(0, trainable=False, name='global_step')

def get_center_loss(features, labels, alpha, num_classes):
    """获取center loss及center的更新op
    
    Arguments:
        features: Tensor,表征样本特征,一般使用某个fc层的输出,shape应该为[batch_size, feature_length].
        labels: Tensor,表征样本label,非one-hot编码,shape应为[batch_size].
        alpha: 0-1之间的数字,控制样本类别中心的学习率,细节参考原文.
        num_classes: 整数,表明总共有多少个类别,网络分类输出有多少个神经元这里就取多少.
    
    Return:
        loss: Tensor,可与softmax loss相加作为总的loss进行优化.
        centers: Tensor,存储样本中心值的Tensor,仅查看样本中心存储的具体数值时有用.
        centers_update_op: op,用于更新样本中心的op,在训练时需要同时运行该op,否则样本中心不会更新
    """
    # 获取特征的维数,例如256维
    len_features = features.get_shape()[1]
    # 建立一个Variable,shape为[num_classes, len_features],用于存储整个网络的样本中心,
    # 设置trainable=False是因为样本中心不是由梯度进行更新的
    centers = tf.get_variable('centers', [num_classes, len_features], dtype=tf.float32,
        initializer=tf.constant_initializer(0), trainable=False)
    # 将label展开为一维的,输入如果已经是一维的,则该动作其实无必要
    labels = tf.reshape(labels, [-1])
    
    # 根据样本label,获取mini-batch中每一个样本对应的中心值
    centers_batch = tf.gather(centers, labels)
    # 计算loss
    loss = tf.nn.l2_loss(features - centers_batch)
    
    # 当前mini-batch的特征值与它们对应的中心值之间的差
    diff = centers_batch - features
    
    # 获取mini-batch中同一类别样本出现的次数,了解原理请参考原文公式(4)
    unique_label, unique_idx, unique_count = tf.unique_with_counts(labels)
    appear_times = tf.gather(unique_count, unique_idx)
    appear_times = tf.reshape(appear_times, [-1, 1])
    
    diff = diff / tf.cast((1 + appear_times), tf.float32)
    diff = alpha * diff
    
    centers_update_op = tf.scatter_sub(centers, labels, diff)
    
    return loss, centers, centers_update_op

def inference(input_images):
    with slim.arg_scope([slim.conv2d], weights_initializer=slim.variance_scaling_initializer(),
        activation_fn=tf.nn.relu, normalizer_fn= slim.batch_norm,kernel_size=3, padding='SAME'):
        with slim.arg_scope([slim.max_pool2d], kernel_size=2):
            
            x = slim.conv2d(input_images, num_outputs=32, scope='conv1_1')
            x = slim.conv2d(x, num_outputs=32, scope='conv1_2')
            x = slim.max_pool2d(x, scope='pool1')
     
            x = slim.conv2d(x, num_outputs=64, scope='conv2_1')
            x = slim.conv2d(x, num_outputs=64, scope='conv2_2')
            x = slim.max_pool2d(x, scope='pool2')
            
            x = slim.conv2d(x, num_outputs=128, scope='conv3_1')
            x = slim.conv2d(x, num_outputs=128, scope='conv3_2')
            x = slim.max_pool2d(x, scope='pool3')
            
            x = slim.flatten(x, scope='flatten')
            
            feature = slim.fully_connected(x, num_outputs=2, activation_fn=None, scope='fc1')
            
            x = tf.nn.relu(feature)

            x = slim.fully_connected(x, num_outputs=10, activation_fn=None, scope='fc2')
    
    return x, feature

def build_network(input_images, labels, ratio=0.5):
    logits, features = inference(input_images)
    
    with tf.name_scope('loss'):
        with tf.name_scope('center_loss'):
            center_loss, centers, centers_update_op = get_center_loss(features, labels, CENTER_LOSS_ALPHA, NUM_CLASSES)
        with tf.name_scope('softmax_loss'):
            softmax_loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits))
        with tf.name_scope('total_loss'):
            total_loss = softmax_loss  # + ratio * center_loss
    
    with tf.name_scope('acc'):
        accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.arg_max(logits, 1), labels), tf.float32))
    
    with tf.name_scope('loss/'):
        tf.summary.scalar('CenterLoss', center_loss)
        tf.summary.scalar('SoftmaxLoss', softmax_loss)
        tf.summary.scalar('TotalLoss', total_loss)
        
    return logits, features, total_loss, accuracy, centers_update_op

logits, features, total_loss, accuracy, centers_update_op = build_network(input_images, labels, ratio=LAMBDA)

mnist = input_data.read_data_sets('/Users/liupeng/Desktop/anaconda/center_loss/tmp/mnist', reshape=False)

optimizer = tf.train.AdamOptimizer(0.001)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
# update_ops.append(centers_update_op)
with tf.control_dependencies(update_ops):
    train_op = optimizer.minimize(total_loss, global_step=global_step)

summary_op = tf.summary.merge_all()
sess = tf.Session()
sess.run(tf.global_variables_initializer())
writer = tf.summary.FileWriter('/Users/liupeng/Desktop/anaconda/center_loss/tmp/mnist_log', sess.graph)

mean_data = np.mean(mnist.train.images, axis=0)

step = sess.run(global_step)
while step <= 80000:
    batch_images, batch_labels = mnist.train.next_batch(128)
    _, summary_str, train_acc = sess.run(
        [train_op, summary_op, accuracy],
        feed_dict={
            input_images: batch_images - mean_data,
            labels: batch_labels,
        })
    step += 1
    
    writer.add_summary(summary_str, global_step=step)
    
    if step % 200 == 0:
        vali_image = mnist.validation.images - mean_data
        vali_acc = sess.run(
            accuracy,
            feed_dict={
                input_images: vali_image,
                labels: mnist.validation.labels
            })
        print(("step: {}, train_acc:{:.4f}, vali_acc:{:.4f}".
              format(step, train_acc, vali_acc)))

# 训练集
feat = sess.run(features, feed_dict={input_images:mnist.train.images[:10000]-mean_data})

# %matplotlib inline
import matplotlib.pyplot as plt

labels = mnist.train.labels[:10000]

f = plt.figure(figsize=(16,9))
c = ['#ff0000', '#ffff00', '#00ff00', '#00ffff', '#0000ff', 
     '#ff00ff', '#990000', '#999900', '#009900', '#009999']
for i in range(10):
    plt.plot(feat[labels==i,0].flatten(), feat[labels==i,1].flatten(), '.', c=c[i])
plt.legend(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'])
plt.grid()
plt.show()

# 测试集
feat = sess.run(features, feed_dict={input_images:mnist.test.images[:10000]-mean_data})

#%matplotlib inline
import matplotlib.pyplot as plt

labels = mnist.test.labels[:10000]

f = plt.figure(figsize=(16,9))
c = ['#ff0000', '#ffff00', '#00ff00', '#00ffff', '#0000ff', 
     '#ff00ff', '#990000', '#999900', '#009900', '#009999']
for i in range(10):
    plt.plot(feat[labels==i,0].flatten(), feat[labels==i,1].flatten(), '.', c=c[i])
plt.legend(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'])
plt.grid()
plt.show()

sess.close()



分别看一下训练集和测试集的结果:

训练集: 测试集:


另外看一下两个的训练结果比较:(左:有center loss;右:无center loss)


推荐阅读:

0. 博客目录

1. 机器学习-1:MachineLN之三要素

2. 机器学习-2:MachineLN之模型评估

3. 机器学习-3:MachineLN之dl

4. 机器学习-4:DeepLN之CNN解析

5. 机器学习-5:DeepLN之CNN权重更新(笔记)

6. 机器学习-6:DeepLN之CNN源码

7. 机器学习-7:MachineLN之激活函数

8. 机器学习-8:DeepLN之BN

9. 机器学习-9:MachineLN之数据归一化

10. 机器学习-10:MachineLN之样本不均衡

11. 机器学习-11:MachineLN之过拟合

12. 机器学习-12:MachineLN之优化算法

13. 机器学习-13:MachineLN之kNN

14. 机器学习-14:MachineLN之kNN源码

15. 机器学习-15:MachineLN之感知机

16. 机器学习-16:MachineLN之感知机源码

17. 机器学习-17:MachineLN之逻辑回归

18. 机器学习-18:MachineLN之逻辑回归源码

19. 机器学习-19:MachineLN之SVM(1)


推荐阅读:

tensorflow系列:

1. Ubuntu 16.04 安装 Tensorflow(GPU支持)

2. 使用Python实现神经网络

3. tf1: nn实现评论分类

4. tf2: nn和cnn实现评论分类

5. tf3: RNN—mnist识别

6. tf4: CNN—mnist识别

7.  tf5: Deep Q Network—AI游戏

8. tf6: autoencoder—WiFi指纹的室内定位

9. tf7: RNN—古诗词

10. tf8:RNN—生成音乐

11. tf9: PixelCNN

12. tf10: 谷歌Deep Dream

13. tf11: retrain谷歌Inception模型

14. tf12: 判断男声女声

15. tf13: 简单聊天机器人

16. tf14: 黑白图像上色

17. tf15: 中文语音识别

18. tf16: 脸部特征识别性别和年龄

19. tf17: “声音大挪移”

20. tf18: 根据姓名判断性别

21.  tf19: 预测铁路客运量

22. tf20: CNN—识别字符验证码

23. tf21: 身份证识别——识别身份证号

24. tf22: ocr识别——不定长数字串识别

25. tf23: “恶作剧” --人脸检测

26. tf24: GANs—生成明星脸

27. tf25: 使用深度学习做阅读理解+完形填空

28. tf26: AI操盘手

29. tensorflow_cookbook--preface

30. 01 TensorFlow入门(1)

31. 01 TensorFlow入门(2)

32. 02 The TensorFlow Way(1)

33. 02 The TensorFlow Way(2)

34. 02 The TensorFlow Way(3)

35. 03 Linear Regression

36. 04 Support Vector Machines

37. tf API 研读1:tf.nn,tf.layers, tf.contrib概述

38. tf API 研读2:math

39. tensorflow中的上采样(unpool)和反卷积(conv2d_transpose)

40. tf API 研读3:Building Graphs

41. tf API 研读4:Inputs and Readers

42. tf API 研读5:Data IO

43. tf API 研读6:Running Graphs

44. tf.contrib.rnn.static_rnn与tf.nn.dynamic_rnn区别

45. Tensorflow使用的预训练的resnet_v2_50,resnet_v2_101,resnet_v2_152等模型预测,训练

46. tensorflow下设置使用某一块GPU、多GPU、CPU的情况

47. 工业器件检测和识别

48. 将tf训练的权重保存为CKPT,PB ,CKPT 转换成 PB格式。并将权重固化到图里面,并使用该模型进行预测

49. tensorsor快速获取所有变量,和快速计算L2范数

50. cnn+rnn+attention

51. Tensorflow实战学习笔记

52. tf27: Deep Dream—应用到视频

53. tf28: 手写汉字识别

54. tf29: 使用tensorboard可视化inception_v4

55. tf30: center loss及其mnist上的应用

56. tf31: keras的LSTM腾讯人数在线预测

57. tf32: 一个简单的cnn模型:人脸特征点训练

58. tf33: 图片降噪:卷积自编码




猜你喜欢

转载自blog.csdn.net/u014365862/article/details/79184966
今日推荐