First attempt at Momentchi Cloud Server - FusionGAN code reproduction through pycharm remote connection

This article mainly records some pitfalls encountered and some conceptual issues that I have misunderstood. I will become familiar with Mochichi Cloud by reproducing FusionGAN.
Local environment: pycharm professional version 23.2 (only the professional version can use ssh, but not the community version. For a paid professional version, you can find a crack on Tao. Students can apply for a free professional version)

1. First, rent the environment

Create your own account and follow the Juchi Cloud WeChat official account to get 5 computing power beans. Beginners can use this to familiarize themselves with the operation. Most of Juchi Cloud is an environment under the Ubuntu system, and a few are under Windows. The windows operating system is only available under the windows icon (with the words "in public beta" above).

Insert image description here

Click to rent to use it. Before placing an order, you needto select the system image. The choice of system image is up to you The environment required by the code is related to your local installation environment (it has nothing to do with the operating system).
Insert image description here

For example, if FunGAN is a program written based on the tensorflow framework, then I need to pre-install the environment with tensorflow. I mentioned before that other bloggers mentioned that too small a memory will report an error, so I directly chose a graphics card with a larger memory.
Insert image description here

If you are placing an order for the first time and have not created an environment yet, you need to select the relevant environment under the system image.

Insert image description here

Before releasing the machine, we can choose to save the environment, so that the next time we use it, we can choose my environment and choose the environment that we have configured.
Insert image description here
Before placing an order, we need to upload the code to Juchi Cloud Network Disk and select - < a i=4>Workspace-My Network Disk. Juchi Cloud provides 5G free network disk space, where we can pre-store codes and data sets. When debugging the code, the network disk access path is under /mnt, and all files we uploaded to the network disk can be viewed through the cd command. Here we choose to upload and upload the local code to the network disk. (Of course, you can also download through the git command, but this is a bit of a waste of time, time=money) FusionGAN download address:FunsionGAN code download address(Download the code locally and then upload it to the network disk)

Insert image description here

Before starting, back up the code first, delete the data set, and leave only the source code (to save the synchronization time of uploading) (this step can also be omitted, and later found that this step is not too time-consuming compared to other problems)

Insert image description here

Open the file in pycharm:

Insert image description here

Select the required GPU to start renting, and wait for it to change from startup to running. If it still does not come out, refresh the page:

Open pycharm and select flie-settings

Insert image description here

Select project-python interpreter and select add interpreter on the right

Insert image description here

Select On SSH

Insert image description here

Fill in the host, port, and user name in Juchi Cloud in order, and then fill in the password.

Select ok and next to enter the fourth step

If you do not change all the places that should be changed in this step, an error will appear and the connection will be disconnected, as shown in the figure below.
Insert image description here

First select Virtualenv environment, click the file icon next to location, and change the address to

 /root/miniconda3/envs/myconda/bin/python

Insert image description here
Then change the base interpreter to

 /root/miniconda3/envs/myconda/bin/python

Insert image description here
Finally, change the Sync folders to

 /mnt/.../FunsionGAN-mater
 #中间是解压后自己命名的文档

Insert image description here

Insert image description here

Enter the system interpreter again, change the interpreter and sync folders to the same path as above, and click create after modification.

Insert image description here
Click ok
Insert image description here
After creation, you can see that the interpreter in the lower right corner changes to the address of the server. Then pycharm will upload the document in the local folder to the server's folder. Click Transfer on the left to view the upload status (in the lower left corner, double arrow).
It can be seen from the upload record that after connecting to SSH, the code under the local file can be automatically synchronized to the remote server to realize the function of modifying the local code remotely. Sometimes, if the synchronization is not done in time, select tool-deployment-upload to,manually synchronize code.
Insert image description here

[2023/9/5 20:31] Upload to root@hz-t3.matpool.com:26634 password
[2023/9/5 20:31] Upload file 'C:\Users\Lenovo\Desktop\看过的文章\code\FusionGAN\FusionGAN-master\LICENSE' to '/mnt/LICENSE'
[2023/9/5 20:31] Upload file 'C:\Users\Lenovo\Desktop\看过的文章\code\FusionGAN\FusionGAN-master\main.py' to '/mnt/main.py'
[2023/9/5 20:31] Upload file 'C:\Users\Lenovo\Desktop\看过的文章\code\FusionGAN\FusionGAN-master\model.py' to '/mnt/model.py'
[2023/9/5 20:31] Upload file 'C:\Users\Lenovo\Desktop\看过的文章\code\FusionGAN\FusionGAN-master\README.md' to '/mnt/README.md'
[2023/9/5 20:31] Upload file 'C:\Users\Lenovo\Desktop\看过的文章\code\FusionGAN\FusionGAN-master\test_one_image.py' to '/mnt/test_one_image.py'
[2023/9/5 20:31] Upload file 'C:\Users\Lenovo\Desktop\看过的文章\code\FusionGAN\FusionGAN-master\utils.py' to '/mnt/utils.py'
[2023/9/5 20:31] Upload to root@hz-t3.matpool.com:26634 password completed in 1 sec, 951 ms: 6 files transferred (25.2 kbit/s)

Click the arrow to select the server address

Insert image description here
Insert image description here

Appears as shown in the figure

Insert image description here
Enter the code to enter the mnt folder and view the files in the folder through ls -a

cd ../mnt
ls -a

Insert image description here
Unzip FusionGAN.zip through unzip (the name may be different and just modify FusionGAN.zip)

unzip  FusionGAN.zip -d FusionGAN

Decompression successful
Insert image description here
Modify the remote synchronization folder directory here, select tool-depolyment-configuration-Mappings
Insert image description here
and change the address to the newly generated address
Insert image description here
Re-upload the document, first click project (if upload to is grayed out, you must not have clicked project) (if you only need to update one code file, click on that code and then click upload), and then Click tool and re-upload
Insert image description here

Return to the terminal and enter the folder through the cd command

cd FusionGAN
cd FusionGAN-master

2. Start testing

In the FusionGAN-master folder, enter through the command line

 python test_one_image.py

An error occurs, indicating that a package is missing.

Insert image description here
enter

pip install scipy==1.1.0

Package installed successfully

Insert image description here
re-enter

 python test_one_image.py

Test success

Insert image description here

The result is saved in result

Insert image description here

Download from tool, tool-depolyment-download form root…

Insert image description here

You can see that it was successfully downloaded to the local

Insert image description here

3. Carry out training

Command line input

python main.py

An error occurred

OSError: Can't write data (file write failed:

This error is caused by the insufficient size of the network disk. Since the free memory is only 5G, the newly generated .h5 file is stored in the network disk and is much larger than 5G, resulting in data overflow. This situation can only be solved Gold, at least 50G is required to run the entire program. After purchasing more than 50G of network disk memory (, I later thought that I could move the files to the server desktop and run them from the beginning. Since I have already purchased the space, I will not try it. I haven’t fallen into the trap yet. Partners can try it)
After purchasing enough hard disk space, an error message appears:

File "main.py", line 45, in main
    srcnn.train(FLAGS)
  File "/mnt/FusionGAN/FusionGAN-master/model.py", line 104, in train
    train_data_ir, train_label_ir = read_data(data_dir_ir)
  File "/mnt/FusionGAN/FusionGAN-master/utils.py", line 31, in read_data
    with h5py.File(path, 'r') as hf:
  File "/root/miniconda3/envs/myconda/lib/python3.5/site-packages/h5py/_hl/files.py", line 408, in __init__
    swmr=swmr)
  File "/root/miniconda3/envs/myconda/lib/python3.5/site-packages/h5py/_hl/files.py", line 173, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 88, in h5py.h5f.open
OSError: Unable to open file (unable to open file: name = './checkpoint/Train_ir/train.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

Change lines 98 and 99 of code in model.py from:

if config.is_train:     
  data_dir_ir = os.path.join('./{}'.format(config.checkpoint_dir), "Train_ir","train.h5")
  data_dir_vi = os.path.join('./{}'.format(config.checkpoint_dir), "Train_vi","train.h5")

Change to

if config.is_train:     
  data_dir_ir = os.path.join("checkpoint_20", "Train_ir","train.h5")
  data_dir_vi = os.path.join("checkpoint_20", "Train_vi","train.h5")

continue to generate errors

Traceback (most recent call last):
  File "main.py", line 48, in <module>
    tf.app.run()
  File "/root/miniconda3/envs/myconda/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "main.py", line 45, in main
    srcnn.train(FLAGS)
  File "/mnt/FusionGAN/FusionGAN-master/model.py", line 144, in train
    for ep in xrange(config.epoch):
NameError: name 'xrange' is not defined

Change lines 144 and 147 from xrange to range

 for ep in range(config.epoch):
    # Run by batch images
    batch_idxs = len(train_data_ir) // config.batch_size
    for idx in range(0, batch_idxs):
      batch_images_ir = train_data_ir[idx*config.batch_size : (idx+1)*config.batch_size]
      batch_labels_ir = train_label_ir[idx*config.batch_size : (idx+1)*config.batch_size]
      batch_images_vi = train_data_vi[idx*config.batch_size : (idx+1)*config.batch_size]
      batch_labels_vi = train_label_vi[idx*config.batch_size : (idx+1)*config.batch_size]

Start training now

Insert image description here

The entire training lasted about one hour, and the GPU usage reached about 90% through Mochichi Cloud.

Insert image description here

final running result

Insert image description here

The model is saved in the checkpoint/CGAN_120 folder. You can see that the model has been updated by checking the checkpoint/CGAN_120 folder.

Insert image description here

The contents of the previous folder

Insert image description here
Run the test program again to compare the before and after test results, which will not be shown here. Interested friends can explore by themselves.

Guess you like

Origin blog.csdn.net/chy5764/article/details/132692823