Use Pytorch 2.0 to train stepping on the pit

overview

Recently, the blogger is running an experiment, but it feels a bit slow under the Pytorch 1.8 version. I just saw that the Pytorch 2.0 version has accelerated a lot, so I am going to use the Pytorch 2.0 version to run the code. During this process, some small problems and some warnings appeared. In order to prevent these errors and warnings from interfering with the results of the experiment, I searched for relevant methods on the Internet and recorded them here.

question

FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun.Note that --use-env is set by default in torchrun

It will also prompt you with the following information:

train.py: error: unrecognized arguments: --local-rank=0

I was running it under Pytorch 1.8 before, and the corresponding running command is:

python -m  torch.distributed.launch --master_port 9843  --nproc_per_node=2  train.py

The following command should be used under Pytorch 2.0 version:

python -m  torch.distributed.launch --master_port 9843  --nproc_per_node=2 --use_env train.py

Just add one --use_env.
After the declaration --use_env, pytorch will add the rank of the current process on the local machine to the environment variable LOCAL_RANKinstead of adding to it args.local_rank.

UserWarning: The parameter ‘pretrained’ is deprecated since 0.13 and may be removed in the future, please use ‘weights’ instead.

This is because the pretrained parameter in torchvision is outdated in Pytorch 2.0, and the weights parameter is now used.
In addition to the warning above, there will also be:

UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet50_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet50_Weights.DEFAULT` to get the most up-to-date weights.

The statement I use in Pytorch 1.8 is:

resnet = models.resnet50(pretrained=True)

The solution is:

resnet = models.resnet50(weights=torchvision.models.ResNet50_Weights.IMAGENET1K_V1)

Just use the latest method.

AttributeError: module ‘numpy’ has no attribute ‘int’.

Because np.int has been deprecated in numpy1.20. For details, please check: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecationsMy
original sentence is:

cls_gt = np.zeros((3, 384, 384), dtype=np.int)

Change to:

cls_gt = np.zeros((3, 384, 384), dtype=np.int_)

or use np.int32ornp.int64

np.bool was a deprecated alias for the builtin bool. To avoid this error in existing code, use bool by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.bool_ here.

Because np.bool has been deprecated in numpy1.20.
My original statement is:

annotation = annotation.astype(np.bool)

Change to:

annotation = annotation.astype(np.bool_)

UserWarning: ComplexHalf support is experimental and many operators don’t support it yet

In addition to the alert, an error will be reported:

RuntimeError: cuFFT only supports dimensions whose sizes are powers of two when computing in half precision

The reason for this is that I am using mixed precision.
The solution is to turn off Automatic mixed precision

Guess you like

Origin blog.csdn.net/qq_41234663/article/details/129896837