Tensorflow2.0官方文档的自学笔记（3）

本文链接： https://blog.csdn.net/weixin_42072754/article/details/102714678

未来道路的选择

今天我想到了，人工智能的方向实在太多，如果不细分指定的方向去学习，就会接触到很多无效信息。
对于以后寻找AI相关工作，相当于做了无用功，所以今天，先定下自己的方向，这样可以提高效率。
我先去百度找一下，人工智能行业的岗位都有哪些

算法工程师（包括设计，实现，训练，验证，应用等5个分类）
大数据工程师（分析、整理、保护）
云计算、云安全
机器学习工程师，数据科学家
AI硬件专家
数据标签专业人员
软件工程师
软件架构师
全栈工程师
产品经理

科学家——研究理论，开发/改进算法;

工程师——结合业务，训练模型;

工程辅助——选择、清洗、标注数据等。

很显然，在学习大数据的基础上，我选择走科学家、大数据工程师、算法工程师这几条路
其中大数据工程师用来吃饭，闲来研究AI算法，毕竟算法才是核心，其它只是基础。

如果照着新手教程，接触的信息有限，所以先来一个面试题

1、深度学习框架TensorFlow中有哪四种常用交叉熵？
　　答： tf.nn.weighted_cross_entropy_with_logits
　　tf.nn.sigmoid_cross_entropy_with_logits
　　tf.nn.softmax_cross_entropy_with_logits
　　tf.nn.sparse_softmax_cross_entropy_with_logits

什么是交叉熵？

先从网上随便搜点信息过来

Cross Entropy
意义：计算语言学消岐的一种有效工具

交叉熵（Cross Entropy）是Shannon信息论中一个重要概念，主要用于度量两个概率分布间的差异性信息。语言模型的性能通常用交叉熵和复杂度（perplexity）来衡量。交叉熵的意义是用该模型对文本识别的难度，或者从压缩的角度来看，每个词平均要用几个位来编码。复杂度的意义是用该模型表示这一文本平均的分支数，其倒数可视为每个词的平均概率。平滑是指对没观察到的N元组合赋予一个概率值，以保证词序列总能通过语言模型得到一个概率值。通常使用的平滑技术有图灵估计、删除插值平滑、Katz平滑和Kneser-Ney平滑。

交叉熵可在神经网络(机器学习)中作为损失函数，p表示真实标记的分布，q则为训练后的模型的预测标记分布，交叉熵损失函数可以衡量p与q的相似性。交叉熵作为损失函数还有一个好处是使用sigmoid函数在梯度下降时能避免均方误差损失函数学习速率降低的问题，因为学习速率可以被输出的误差所控制。
[1] 在特征工程中，可以用来衡量两个随机变量之间的相似度。
在语言模型中（NLP）中，由于真实的分布p是未知的，在语言模型中，模型是通过训练集得到的，交叉熵就是衡量这个模型在测试集上的正确率。 [2]

关于交叉熵在loss函数中使用的理解交叉熵（cross
entropy）是深度学习中常用的一个概念，一般用来求目标与预测值之间的差距。以前做一些分类问题的时候，没有过多的注意，直接调用现成的库，用起来也比较方便。最近开始研究起对抗生成网络（GANs），用到了交叉熵，发现自己对交叉熵的理解有些模糊，不够深入。遂花了几天的时间从头梳理了一下相关知识点，才算透彻的理解了，特地记录下来，以便日后查阅。

信息论交叉熵是信息论中的一个概念，要想了解交叉熵的本质，需要先从最基本的概念讲起。

1 信息量首先是信息量。假设我们听到了两件事，分别如下：事件A：巴西队进入了2018世界杯决赛圈。
事件B：中国队进入了2018世界杯决赛圈。

交叉熵（Cross-Entropy）交叉熵是一个在ML领域经常会被提到的名词。在这篇文章里将对这个概念进行详细的分析。

提取上述内容关键字，进行细分学习。

官方文档的第二篇新手教程

quickstart for experts

Import TensorFlow into your program:

from __future__ import absolute_import, division, print_function, unicode_literals
# __future__ 新版本特性
# absolute_import 绝对路径引入
# unicode_literals 统一编码为unicode，使python2.x可以使用python3.x的语法
# division 执行精确除法，结果是浮点数;不导入division是整除，结果是整数
# print_function 使py2.x可以使用py3.x的print函数，可以加括号了

import tensorflow as tf
from tensorflow.keras.layers import Dense, Flatten, Conv2D
from tensorflow.keras import Model

Load and prepare the MNIST dataset.

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

# Add a channels dimension
x_train = x_train[..., tf.newaxis]
x_test = x_test[..., tf.newaxis]

Use tf.data to batch and shuffle the dataset:

train_ds = tf.data.Dataset.from_tensor_slices((x_train, y_train)).shuffle(10000).batch(32)

test_ds = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(32)

Build the tf.keras model using the Keras model subclassing API:

class MyModel(Model):
  def __init__(self):
    super(MyModel, self).__init__()
    self.conv1 = Conv2D(32, 3, activation='relu')
    self.flatten = Flatten()
    self.d1 = Dense(128, activation='relu')
    self.d2 = Dense(10, activation='softmax')

  def call(self, x):
    x = self.conv1(x)
    x = self.flatten(x)
    x = self.d1(x)
    return self.d2(x)

# Create an instance of the model
model = MyModel()

Choose an optimizer and loss function for training:

loss_object = tf.keras.losses.SparseCategoricalCrossentropy()

optimizer = tf.keras.optimizers.Adam()

Select metrics to measure the loss and the accuracy of the model.
These metrics accumulate the values over epochs and then print the overall result.

train_loss = tf.keras.metrics.Mean(name='train_loss')
train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='train_accuracy')

test_loss = tf.keras.metrics.Mean(name='test_loss')
test_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='test_accuracy')

Use tf.GradientTape to train the model:

@tf.function
def train_step(images, labels):
  with tf.GradientTape() as tape:
    predictions = model(images)
    loss = loss_object(labels, predictions)
  gradients = tape.gradient(loss, model.trainable_variables)
  optimizer.apply_gradients(zip(gradients, model.trainable_variables))

  train_loss(loss)
  train_accuracy(labels, predictions)

Test the model:

@tf.function
def test_step(images, labels):
  predictions = model(images)
  t_loss = loss_object(labels, predictions)

  test_loss(t_loss)
  test_accuracy(labels, predictions)

EPOCHS = 5

for epoch in range(EPOCHS):
  for images, labels in train_ds:
    train_step(images, labels)

  for test_images, test_labels in test_ds:
    test_step(test_images, test_labels)

  template = 'Epoch {}, Loss: {}, Accuracy: {}, Test Loss: {}, Test Accuracy: {}'
  print(template.format(epoch+1,
                        train_loss.result(),
                        train_accuracy.result()*100,
                        test_loss.result(),
                        test_accuracy.result()*100))

  # Reset the metrics for the next epoch
  train_loss.reset_states()
  train_accuracy.reset_states()
  test_loss.reset_states()
  test_accuracy.reset_states()

.gradient(loss, model.trainable_variables)？

gradient 梯度、倾斜度

trainable_variables

Returns all variables created with trainable=True.

tf.compat.v1.trainable_variables(scope=None)

When passed trainable=True, the Variable() constructor automatically adds new variables to the graph collection GraphKeys.TRAINABLE_VARIABLES. This convenience function returns the contents of that collection.

Args:
scope: (Optional.) A string. If supplied, the resulting list is filtered to include only items whose name attribute matches scope using re.match. Items without a name attribute are never returned if a scope is supplied. The choice of re.match means that a scope without special tokens filters by prefix.

Returns:
A list of Variable objects.

loss_object？

损失函数对象

什么是tf.GradientTape()？

斜率类型？？

什么是@tf.function?

可以用@tf.function装饰器来将python代码转成图表示代码

什么是.SparseCategoricalAccuracy()?

对于二分类问题，评价指标可以用 binary_accuracy，就是最直观上讲的准确率。
当面对多分类或者多标签的任务时，评价度量可能会用到这两个 categorical_accuracy和
sparse_categorical_accuracy

sparse_categorical_accuracy检查 y_true 中的值（本身就是index）与 y_pred中最大值对应的index是否相等。
针对稀疏情况的多分类，这里的 y_true 就是真实类的 index ，是个整数

什么是.SparseCategoricalCrossentropy()？

categorical_crossentropy 和 sparse_categorical_crossentropy 的区别在哪？

如果你的 targets 是 one-hot 编码，用 categorical_crossentropy 　　one-hot 编码：[0,
0, 1], [1, 0, 0], [0, 1, 0] 如果你的 tagets 是数字编码，用
sparse_categorical_crossentropy 　　数字编码：2, 0, 1

categorical_crossentropy是Keras中的多分类损失函数

什么是from_tensor_slices？

tf.data.Dataset.from_tensor_slices( ) 解释
使用这个函数转化数据集时，按照传入数据的第一维度切分，生成相应的 dataset 。

ds = tf.data.Dataset.from_tensor_slices((x, y)) ## 输入 x,y 进行切分一一对应
train_ds = ds.take(20000).shuffle(20000).batch(100) take :
创建一个元素集合，最多包含多少个元素

shuffle: 打乱元素

batch: 选取多少个元素作为一个batch大小

repeat: 数据重复训练次数

然而百度和官网都搜不到详细定义，估计是2.0去除了这个

什么是.shuffle？

洗牌，打乱顺序

什么是.batch？

最后，让我们用一个小例子来说明这一点。

假设您有一个包含200个样本（数据行）的数据集，并且您选择的Batch大小为5和1,000个Epoch。

这意味着数据集将分为40个Batch，每个Batch有5个样本。每批五个样品后，模型权重将更新。

这也意味着一个epoch将涉及40个Batch或40个模型更新。

有1000个Epoch，模型将暴露或传递整个数据集1,000次。在整个培训过程中，总共有40,000Batch。

什么是Flatten？

Flattens the input. Does not affect the batch size.

If inputs are shaped (batch,) without a channel dimension, then flattening adds an extra channel dimension and output shapes are (batch, 1).

Arguments:

data_format: A string, one of channels_last (default) or channels_first. The ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape (batch, …, channels) while channels_first corresponds to inputs with shape (batch, channels, …). It defaults to the image_data_format value found in your Keras config file at ~/.keras/keras.json. If you never set it, then it will be “channels_last”.

什么是Conv2D?

2D convolution layer (e.g. spatial convolution over images).

This layer creates a convolution kernel that is convolved with the layer input to produce a tensor of outputs. If use_bias is True, a bias vector is created and added to the outputs. Finally, if activation is not None, it is applied to the outputs as well.

When using this layer as the first layer in a model, provide the keyword argument input_shape (tuple of integers, does not include the sample axis), e.g. input_shape=(128, 128, 3) for 128x128 RGB pictures in data_format=“channels_last”.

Arguments:
filters: Integer, the dimensionality of the output space (i.e. the number of output filters in the convolution).

kernel_size: An integer or tuple/list of 2 integers, specifying the height and width of the 2D convolution window. Can be a single integer to specify the same value for all spatial dimensions.

strides: An integer or tuple/list of 2 integers, specifying the strides of the convolution along the height and width. Can be a single integer to specify the same value for all spatial dimensions. Specifying any stride value != 1 is incompatible with specifying any dilation_rate value != 1.

padding: one of “valid” or “same” (case-insensitive).

data_format: A string, one of channels_last (default) or channels_first. The ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape (batch, height, width, channels) while channels_first corresponds to inputs with shape (batch, channels, height, width). It defaults to the image_data_format value found in your Keras config file at ~/.keras/keras.json. If you never set it, then it will be “channels_last”.

dilation_rate: an integer or tuple/list of 2 integers, specifying the dilation rate to use for dilated convolution. Can be a single integer to specify the same value for all spatial dimensions. Currently, specifying any dilation_rate value != 1 is incompatible with specifying any stride value != 1.

activation: Activation function to use. If you don’t specify anything, no activation is applied (ie. “linear” activation: a(x) = x).

use_bias: Boolean, whether the layer uses a bias vector.

kernel_initializer: Initializer for the kernel weights matrix.

bias_initializer: Initializer for the bias vector.

kernel_regularizer: Regularizer function applied to the kernel weights matrix.

bias_regularizer: Regularizer function applied to the bias vector.

activity_regularizer: Regularizer function applied to the output of the layer (its “activation”)…

kernel_constraint: Constraint function applied to the kernel matrix.

bias_constraint: Constraint function applied to the bias vector.

Input shape:
4D tensor with shape: (samples, channels, rows, cols) if data_format=‘channels_first’ or 4D tensor with shape: (samples, rows, cols, channels) if data_format=‘channels_last’.

Output shape:
4D tensor with shape: (samples, filters, new_rows, new_cols) if data_format=‘channels_first’ or 4D tensor with shape: (samples, new_rows, new_cols, filters) if data_format=‘channels_last’. rows and cols values might have changed due to padding.

总结

经过一个粗略的细分学习，上面的代码能看懂一点点了。
另外，1024程序员节快乐，今天吃了个橙子，感觉很好。