torch.utils.data

1.torch.utils.data.DataLoader(datasetbatch_size=1shuffle=Falsesampler=Nonebatch_sampler=Nonenum_workers=0collate_fn=<function default_collate>pin_memory=Falsedrop_last=Falsetimeout=0worker_init_fn=None)[SOURCE]

Data loader. Combines a dataset and a sampler, and provides single- or multi-process iterators over the dataset.组合数据集和采样器,提供数据集上的迭代器。

Parameters:
  • dataset (Dataset) – dataset from which to load the data. 数据集。
  • batch_size (intoptional) – how many samples per batch to load (default: 1).一个batch多少样本。
  • shuffle (booloptional) – set to True to have the data reshuffled at every epoch (default: False).每个epoch重新重组数据
  • sampler (Sampleroptional) – defines the strategy to draw samples from the dataset. If specified, shuffle must be False.
  • batch_sampler (Sampleroptional) – like sampler, but returns a batch of indices at a time. Mutually exclusive with batch_sizeshufflesampler, and drop_last.
  • num_workers (intoptional) – how many subprocesses to use for data loading. 0 means that the data will be loaded in the main process. (default: 0)多少个子进程加载程序。
  • collate_fn (callableoptional) – merges a list of samples to form a mini-batch.
  • pin_memory (booloptional) – If True, the data loader will copy tensors into CUDA pinned memory before returning them.
  • drop_last (booloptional) – set to True to drop the last incomplete batch, if the dataset size is not divisible by the batch size. If False and the size of dataset is not divisible by the batch size, then the last batch will be smaller. (default: False)
  • timeout (numericoptional) – if positive, the timeout value for collecting a batch from workers. Should always be non-negative. (default: 0)
  • worker_init_fn (callableoptional) – If not None, this will be called on each worker subprocess with the worker id (an int in [0, num_workers - 1]) as input, after seeding and before data loading. (default: None)

NOTE

By default, each worker will have its PyTorch seed set to base_seed + worker_id, where base_seed is a long generated by main process using its RNG. However, seeds for other libraies may be duplicated upon initializing workers (w.g., NumPy), causing each worker to return identical random numbers. (See My data loader workers return identical random numbers section in FAQ.) You may use torch.initial_seed() to access the PyTorch seed for each worker in worker_init_fn, and use it to set other seeds before data loading.

WARNING

If spawn start method is used, worker_init_fn cannot be an unpicklable object, e.g., a lambda function.

猜你喜欢

转载自blog.csdn.net/shaodongheng/article/details/89490142
今日推荐