Settings of num_worker in training code


The role and specific embodiment of num_workers

num_workers 通常是用于控制数据加载和预处理的并行工作进程数量。主要体现在在进行一次epoch训练之前,加载一个批次图片所需的时间(若batchsize设置的比较大,如64,将num_workers设为0,就会发现进行一次epoch迭代训练之前所花费的时间很长;若batchsize设置的比较小,如8,则不是很明显)


How to choose the right num_workers

num_workers is the same as batchsize. In theory, the larger the better if the hardware allows it. However, there is currently no definite statement as to how much it should actually be set to. Here is a reference code to see which num_worker is the fastest by setting the batchsize, image size and data set production and loading.

The code is as follows (example):

from dataset import trainIC_dataset
from torch.utils.data import DataLoader
import logging
import time

logger = logging.getLogger()

cropsize = (448,448)
image_path = 'lastdata/20231016'


ds = trainIC_dataset(image_path, paste_dir='', size=cropsize, mode='train')
dsval = trainIC_dataset(image_path, paste_dir='', size=cropsize, mode='val')

print(f"num of CPU: {mp.cpu_count()}")
for num_workers in range(2, mp.cpu_count(), 2):
    dl = DataLoader(ds,
                    batch_size=128,
                    shuffle=True,
                    num_workers=num_workers,
                    pin_memory=False,
                    drop_last=False
                    )

    dlval = DataLoader(dsval,
                       batch_size=128,
                       shuffle=True,
                       pin_memory=False,
                       # sampler = sampler_val,
                       num_workers=1,
                       drop_last=False)

    start = time.time()
    for epoch in range(1, 3):
        for i, data in enumerate(dl, 0):
            pass
    end = time.time()
    print("Finish with:{} second, num_workers={}".format(end - start, num_workers))

  The above code is to test the num_worker of the training set. First, the maximum core number of the CPU is obtained, and then the num_worker is changed from 2, 4, 6, 8... to the maximum core number to record the time of loading the data set. (The num_workers of the validation set is set to 1 here to ensure reproducibility and avoid potential concurrency issues; of course, other values ​​can also be set for efficiency).
  Experiments have found that it is not that the larger the num_worker is, the less time it takes; moreover, changes in batchsize or image size will also change the num_worker value with the least time (for example, when batchsize=64, the num_worker with the least time is 12; when batchsize =128, the num_worker with the least time is 8). Of course, this time is for reference only .

Guess you like

Origin blog.csdn.net/qq_43199575/article/details/134030400