[Tensorflow2.x] Set GPU (memory self-growth, designated GPU)

every blog every motto: You can do more than you think.

0. Preface

This section mainly explains about GPU settings. Specifically, how to save GPU resources. Not much to say, let’s get to the main topic below:

1. Text

1.1 Introduction

  1. tensorflow will fill the GPU (as much as possible) by default, which will cause other programs to be unable to use the GPU, resulting in a waste of GPU memory and computing resources. There are usually the following methods:
  • Set memory self-growth
  • Virtual device mechanism (similar to a window disk)
  1. Use of multiple GPUs
  • Virtual GPU and actual GPU
  • Manual setting and distributed mechanism
  1. API list
    description: the comment is below the corresponding code
tf.debugging.set_log_device_placement  
# 打印变量在哪个设备上
tf.config.experimental.set_visible_devices  
# 设置对本进程可见的设备
tf.config.experimental.list_logical_devices  
# 获取所有的逻辑设备
tf.config.experimental.list_physical_devices  
# 获取物理设备的列表
tf.config.experimental.set_memory_growth  
# 设置内存自增长
tf.config.experimental.VirtualDeviceConfiguration  
# 建立逻辑分区
tf.config.set_soft_device_placement 
# 自动分配变量到某个设备上
  1. Monitor GPU conditions
  • General
    Enter in the window console (win+R -> cmd) or linux terminal
nvidia-smi
  • Real-time monitoring
    window
nvidi-smi -l

linux(ubuntu):

watch -n 0.1 nvidia-smi 

1.2 Set memory self-growth

By default, the program will GPU memory (as) the full, although it did not use so much memory, as shown below:
Insert picture description here
In the beginning of the program position , set the memory from growth, as follows:
Note: In At the beginning of the program, set the memory self-growth for the GPU and call the following function.

def set_GPU():
    """GPU相关设置"""

    # 打印变量在那个设备上
    # tf.debugging.set_log_device_placement(True)
    # 获取物理GPU个数
    gpus = tf.config.experimental.list_physical_devices('GPU')
    print('物理GPU个数为:', len(gpus))
    # 设置内存自增长
    for gpu in gpus:
        tf.config.experimental.set_memory_growth(gpu, True)
    print('-------------已设置完GPU内存自增长--------------')
    # 获取逻辑GPU个数
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print('逻辑GPU个数为:', len(logical_gpus))

Insert picture description here
GPU usage is as follows:
Insert picture description here

1.3 Specify the GPU to be visible

By default, the program will use the first GPU, we can specify it, that is, let the program run on the GPU we specify. code show as below:

def set_GPU():
    """GPU相关设置"""

    # 打印变量在那个设备上
    # tf.debugging.set_log_device_placement(True)
    # 获取物理GPU个数
    gpus = tf.config.experimental.list_physical_devices('GPU')
    print('物理GPU个数为:', len(gpus))
    # 设置内存自增长
    for gpu in gpus:
        tf.config.experimental.set_memory_growth(gpu, True)
    print('-------------已设置完GPU内存自增长--------------')

    # 设置哪个GPU对设备可见,即指定用哪个GPU
    tf.config.experimental.set_visible_devices(gpus[-1], 'GPU')
    # 获取逻辑GPU个数
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print('逻辑GPU个数为:', len(logical_gpus))

Insert picture description here
Description:

  1. This machine has only 1 GPU, so the program is executed on the server, so the nvidia information is slightly different.
  2. As shown in the code, we specify that the last GPU is visible, that is, we use the last GPU.
  3. There are still two physical GPUs, and now there is only one logical GPU.
  4. Among them, we still set the memory self-growth.
    Insert picture description hereInsert picture description here

1.4 GPU segmentation

Description:

  1. Similar to dividing a physical disk into several zones, that is, when we usually use the computer, there may only be one physical disk on the computer, but we will divide it into several zones, such as C drive, D drive, etc.
  2. Physical GPU, that is, a real GPU; logical GPU, that is, virtual GPU (mapping from physical GPU to logical GPU)
def set_GPU():
    """GPU相关设置"""

    # 打印变量在那个设备上
    # tf.debugging.set_log_device_placement(True)
    # 获取物理GPU个数
    gpus = tf.config.experimental.list_physical_devices('GPU')
    print('物理GPU个数为:', len(gpus))
    # 设置内存自增长
    # for gpu in gpus:
    #     tf.config.experimental.set_memory_growth(gpu, True)
    # print('-------------已设置完GPU内存自增长--------------')

    # 设置哪个GPU对设备可见,即指定用哪个GPU
    tf.config.experimental.set_visible_devices(gpus[-1], 'GPU')
    # 切分逻辑GPU
    tf.config.experimental.set_virtual_device_configuration(
        gpus[-1],  # 指定要切割的物理GPU
        # 切割的个数,和每块逻辑GPU的大小
        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=4096),
         tf.config.experimental.VirtualDeviceConfiguration(memory_limit=4096), ]
    )
    # 获取逻辑GPU个数
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print('逻辑GPU个数为:', len(logical_gpus))

Insert picture description here
As shown in the figure below, we say that the last physical GPU is divided into two pieces. Now the number of logical GPUs is 2, which is different from the logical GPU of 1 in (1.2) above.
Insert picture description here
Insert picture description here

1.5 Use of multi-GPU environment

1.5.1 Manually specify

code show as below:

import os

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
import tensorflow as tf
  
gpus = tf.config.experimental.list_physical_devices('GPU')
print('物理GPU个数为:', len(gpus))
# 设置内存自增长
for gpu in gpus:
    tf.config.experimental.set_memory_growth(gpu, True)
print('-------------已设置完GPU内存自增长--------------')

# 获取逻辑GPU个数
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
print('逻辑GPU个数为:', len(logical_gpus))

c = []
# 手动指定多GPU环境
for gpu in logical_gpus:
    print(gpu.name)
    with tf.device(gpu.name):
        a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
        b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
        c.append(tf.matmul(a, b))
with tf.device('/CPU:0'):
    matmul_sum = tf.add_n(c)
print(matmul_sum)

Insert picture description here
harm:

  1. Too many details to control
  2. Some devices do not support

1.5.2 Distributed Strategy

Insert picture description here

1. MirroredStrategy

  • Synchronous distributed training
  • Suitable for one machine with multiple cards
  • Each GPU has all the parameters of the network structure, these parameters will be synchronized
  • Data parallel
    • Batch data is cut into N copies to each GPU
    • Gradient aggregation is then updated to the parameters on each GPU

2. CentralStorageStrategy

  • Variant of MirroredStrategy
  • Parameters are not stored on each GPU, but stored on a device
    • CPU or only GPU
  • The calculation is parallel on all GPUs
    • In addition to the calculation of update parameters

3. MultiWorkerMirroredStrategy

  • Similar to MirroredStrategy
  • Suitable for multi-machine and multi-card situations

4. TPUStrategy

  • Similar to MirroredStrategy
  • Use the strategy on TPU

4. ParameterServerStrategy

  • Asynchronous distributed
  • More suitable for distributed systems
  • The machine is divided into two categories: Parameter Server and worker
    • Parameter server is responsible for integrating gradients and updating parameters
    • Worker is responsible for computing and training the network

Insert picture description here
Insert picture description here
The pros and cons of synchronization and asynchrony

  • Multi-machine multi-card
    • Asynchrony can avoid short-board effects
  • One machine with multiple cards
    • Synchronization can avoid excessive communication
  • Asynchronous calculations will increase the scope of the model
    • Asynchrony is not strictly correct, so the model is easier to tolerate errors

1.6 Summary

Ways to save GPU memory resources:

  • Self-growth
  • Logical segmentation

references

[1] https://blog.csdn.net/weixin_39190382/article/details/104739572

Guess you like

Origin blog.csdn.net/weixin_39190382/article/details/110533410