TensorFlow for deep learning (2): TensorFlow basics

1. TensorFlow system architecture:

  It is divided into device layer and network layer, data operation layer, graph calculation layer, API layer and application layer. The device layer, network layer, data operation layer, and graph computing layer are the core layers of TensorFlow.

 

2. TensorFlow design concept:

 (1) Completely separate the definition of the graph from the operation of the graph. TensorFlow is completely symbolic programming.

    Symbolic calculation generally defines various variables first, then establishes a data flow graph, specifies the calculation relationship between each variable in the data flow graph, and finally needs to compile the data flow graph. At this time, the data flow graph is still empty. The shell does not have any actual data in it. Only after the required input is put in, the data flow can be formed in the entire model, thereby forming the output value.

As shown in the following figure: An operation is defined, but does not actually run.

 (2) The operations involved in TensorFlow must be placed in the graph, and the operation of the graph only happens in the session. After the session is opened, the nodes can be filled with data to perform calculations; when the session is closed, calculations cannot be performed. Sessions provide an environment for operations to run and Tensors to evaluate.

A simple example:

 

3. TensorFlow concepts:

 (1) Edges: TensorFlow's edges have two connection relationships: data dependencies (represented by solid lines) and control dependencies (represented by dashed lines). Implementation edges represent data dependencies and represent data, that is, tensors. Data of any dimension is collectively referred to as a tensor. The dotted edge is called a control dependency and can be used to control the operation of the operation. There is no data flow on this type of edge, but the source node must complete the execution before the destination node starts executing.

 (2) Node: Node represents an operation and is generally used to represent the applied mathematical operation.

 (3) Graph: describe the operation task as a directed acyclic graph. Create a graph using the tf.constant() method:

a = tf.constant([1.0,2.0])

 (4) Session: The first step in starting the graph is to create a Session object. Sessions provide some methods of performing operations on the graph. Use the tf.Session() method to create the object and call the run() method of the Session object to execute the graph:

with tf.Session() as sess:
    result = sess.run([product])
    print(result)

 (5) Device: A device refers to a piece of hardware that can be used for operations and has its own address space. Method: tf.device()

 (6) Variable: A variable is a special kind of data that has a fixed position in the graph and does not flow like a normal tensor. Variables are created using the tf.Variable() constructor, which requires an initial value, and the shape and type of the initial value determines the shape and type of the variable.

#Create a variable, initialized to scalar 0 
state = tf.Variable(0, name= " counter " )

 (7) Kernel: A kernel is an implementation of an operation that can run on a specific device (such as CPU, GPU).

 

4. TensorFlow batch normalization:

 Batch normalization (BN) was born to overcome the difficulty in training caused by the deepening of neural network layers.

 Method: Batch normalization is generally used before nonlinear mapping (activation function) to plan x=Wu+b, so that the mean of the result (each dimension of the output signal) is 0 and the variance is 1.

 Usage: In the case that the neural network convergence speed is very slow and the gradient explosion cannot be trained, you can try to use batch normalization to solve it.

 

5. Neuron function:

 (1) Activation function: When the activation function is running, a certain part of the neurons in the neural network is activated, and the activation information is passed back to the neural network of the next layer. Introduce several commonly used activation functions.

  a.sigmoid function. sigmoid maps a real value to the (0, 1) interval, which can be used for binary classification.

The method of use is as follows:

a = tf.constant([[1.0,2.0], [1.0, 2.0], [1.0, 2.0]])
sess = tf.Session()
print(sess.run(tf.sigmoid(a))

 

  b.softmax 函数。softmax把一个 k 维的 real value 向量(a1, a2 a3, a4, ...)映射成一个(b1, b2, b3, b4, ...)其中bi 是一个0-1的常数,然后可以根据 bi 的大小来进行多分类的任务,如取权重最大的一维。

函数表达式:

函数图像:

  c.relu 函数。relu 函数可以解决 sigmoid 函数梯度下降慢,消失的问题。

使用方法如下:

a = tf.constant([-1.0, 2.0])
with tf.Session() as sess:
    b = tf.nn.relu(a)
    print(sess.run(b))

  d.dropout 函数。一个神经元将以概率keep_prob 决定是否被抑制。如果被抑制,该神经元输出就为0;如果不被抑制,那么该神经元的输出值将被放大到原来的1/keep_prob 倍。(可以解决过拟合问题)。

使用方法如下:

a = tf.constant([[-1.0, 2.0, 3.0, 4.0]])
with tf.Sessin() as sess:
    b = tf.nn.dropout(a, 0.5, noise_shape=[1,4])
    print(sess.run(b))

 

 (2)卷积函数:卷积函数是构建神经网络的重要支架,是在一批图像上扫描的二维过滤器。这里简单介绍一下卷积函数的几个方法。

  a.tf.nn.convolution(input, filter, padding, strides=None, dilation_rate=None, name=None, data_format=None) 这个函数计算 N 维卷积的和。

  b.tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, name=None, name=None) 这个函数的作用是对一个四维的输入数据 input 和四维的卷积核 filter 进行操作,然后对输入数据进行一个二维的卷积操作,最后得到卷积之后的结果。

  此外还有tf.nn.depthwise_conv2d()、tf.nn.separable_conv2d() 等方法,这里不再一一说明。

 

 (3)池化函数:在神经网络中,池化函数一般跟在卷积函数的下一层池化操作是利用一个矩阵窗口在张量上进行扫描,将每个矩阵窗口中的值通过取最大值或平均值来减少元素个数。每个池化操作的矩阵窗口大小是有 ksize 指定的,并且根据步长 strides 决定移动步长。

  a.tf.nn.avg_pool(value, ksize, strides, padding, data_format='NHWC', name=None)。计算池化区域中元素的平均值。

  b.tf.nn.max_pool(value, ksize, strides, padding, data_format='NHWC', name=None)。计算池化区域中元素的最大值。

 

6.模型的存储于加载

 (1)模型存储主要是建立一个 tf.train.Saver() 来保存变量,通过在 tf.train.Saver 对象上调用 Saver.save() 生成,并且制定保存的位置,一般模型的扩展名为 .ckpt。

saver.save(sess, ckpt_dir + "/model.ckpt", global_step=global_step)

 (2)加载模型可以用 saver.resotre 来进行模型的加载。

with tf.Session() as sess:
    tf.initialize_all_variables().run()
    
    ckpt = tf.train.get_checkpoint_state(ckpt_dir)
    if ckpt and ckpt.model_checkpoint_path:
        print(ckpt.model_checkpoint_path) 
        saver.restore(sess, ckpt.model_checkpoint_path) #加载所有的参数

 

 

PS: 一张图看懂拟合、过拟合和欠拟合。画的有点丑就是。。。

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325650458&siteId=291194637