Record the pitfalls encountered by DeblurGANv2 running program

Important things to say three times: Be sure to remember to connect to the Internet on the server first, otherwise some weird errors will occur. woo woo woo

Download the code from github. My environment is python3.7 and torch-gpu is 1.7.1 (these two are installed separately. The python environment is created using anconda. The installation of torch is described in the previous article). According to GitHub The requirements.txt file uses the command to install the packages required to run the program

Order:

pip install -r requirements.txt  -i https://pypi.tuna.tsinghua.edu.cn/simple

The purpose behind -i is to download the installation package faster and use the Tsinghua mirror.

You can also directly pip install -r requirements.txt

train:

Before training, you need to modify the contents of files_a in the configuration file config.yaml, which represents the path of the training data. 

Then run python train.py (be sure to remember to connect to the Internet for the first time) because it will automatically download the inceptionresnetv2.pth file. If you are not connected to the Internet, a ******'\xef' error will appear (as shown in the picture). I was confused for a day, but my senior helped me solve it. The possible reason was that there was an error in downloading the file during the first run (because there was no connection to the network). So it must be running online.

Tips learned:

  1. conda env list View the created python environment
  2. conda activate environment name: activate python environment
  3. When running on a server, when the server has two GPUs, if one person is running code on the server and does nothing, he cannot run a network. A situation as shown in the figure occurs

Solution: Use the command CUDA_VISIBLE_DEVICES=1 python train.py to run a network based on the above:

CUDA_VISIBLE_DEVICES=1 python train.py
  • Training in the background:
  • Use the command: nohup python train.py &  to place the training in the background without affecting subsequent operations. (Because training takes up a lot of time)
  • Use the cat nohup.out command to view the contents of the nohup.out file. The contents are the saved training data.

Pause a program running on the server:

Use the nvidia-smi command to view information

To kill the process you want to shut down based on pid, use the command kill process id

I’ve gained so much today, record it so you don’t forget.

Guess you like

Origin blog.csdn.net/qq_44808827/article/details/122096633