This article mainly records some pitfalls encountered and some conceptual issues that I have misunderstood. I will become familiar with Mochichi Cloud by reproducing FusionGAN.
Local environment: pycharm professional version 23.2 (only the professional version can use ssh, but not the community version. For a paid professional version, you can find a crack on Tao. Students can apply for a free professional version)
1. First, rent the environment
Create your own account and follow the Juchi Cloud WeChat official account to get 5 computing power beans. Beginners can use this to familiarize themselves with the operation. Most of Juchi Cloud is an environment under the Ubuntu system, and a few are under Windows. The windows operating system is only available under the windows icon (with the words "in public beta" above).
Click to rent to use it. Before placing an order, you needto select the system image. The choice of system image is up to you The environment required by the code is related to your local installation environment (it has nothing to do with the operating system).
For example, if FunGAN is a program written based on the tensorflow framework, then I need to pre-install the environment with tensorflow. I mentioned before that other bloggers mentioned that too small a memory will report an error, so I directly chose a graphics card with a larger memory.
If you are placing an order for the first time and have not created an environment yet, you need to select the relevant environment under the system image.
Before releasing the machine, we can choose to save the environment, so that the next time we use it, we can choose my environment and choose the environment that we have configured.
Before placing an order, we need to upload the code to Juchi Cloud Network Disk and select - < a i=4>Workspace-My Network Disk. Juchi Cloud provides 5G free network disk space, where we can pre-store codes and data sets. When debugging the code, the network disk access path is under /mnt, and all files we uploaded to the network disk can be viewed through the cd command. Here we choose to upload and upload the local code to the network disk. (Of course, you can also download through the git command, but this is a bit of a waste of time, time=money) FusionGAN download address:FunsionGAN code download address(Download the code locally and then upload it to the network disk)
Before starting, back up the code first, delete the data set, and leave only the source code (to save the synchronization time of uploading) (this step can also be omitted, and later found that this step is not too time-consuming compared to other problems)
Open the file in pycharm:
Select the required GPU to start renting, and wait for it to change from startup to running. If it still does not come out, refresh the page:
![](https://img-blog.csdnimg.cn/ed01a5a541af4c4dac2a7df58041d362.png#pic_left)
![](https://img-blog.csdnimg.cn/84fd2dd488eb436890892daab0c091a1.png#pic_left)
Select project-python interpreter and select add interpreter on the right
Select On SSH
Fill in the host, port, and user name in Juchi Cloud in order, and then fill in the password.
![](https://img-blog.csdnimg.cn/3e827d8d829b4334adf0f53368ef9138.png)
![](https://img-blog.csdnimg.cn/1def07ebf0ba4cf191b2f7f6d1f222cf.png)
If you do not change all the places that should be changed in this step, an error will appear and the connection will be disconnected, as shown in the figure below.
First select Virtualenv environment, click the file icon next to location, and change the address to
/root/miniconda3/envs/myconda/bin/python
Then change the base interpreter to
/root/miniconda3/envs/myconda/bin/python
Finally, change the Sync folders to
/mnt/.../FunsionGAN-mater
#中间是解压后自己命名的文档
Enter the system interpreter again, change the interpreter and sync folders to the same path as above, and click create after modification.
Click ok
After creation, you can see that the interpreter in the lower right corner changes to the address of the server. Then pycharm will upload the document in the local folder to the server's folder. Click Transfer on the left to view the upload status (in the lower left corner, double arrow).
It can be seen from the upload record that after connecting to SSH, the code under the local file can be automatically synchronized to the remote server to realize the function of modifying the local code remotely. Sometimes, if the synchronization is not done in time, select tool-deployment-upload to,manually synchronize code.
[2023/9/5 20:31] Upload to root@hz-t3.matpool.com:26634 password
[2023/9/5 20:31] Upload file 'C:\Users\Lenovo\Desktop\看过的文章\code\FusionGAN\FusionGAN-master\LICENSE' to '/mnt/LICENSE'
[2023/9/5 20:31] Upload file 'C:\Users\Lenovo\Desktop\看过的文章\code\FusionGAN\FusionGAN-master\main.py' to '/mnt/main.py'
[2023/9/5 20:31] Upload file 'C:\Users\Lenovo\Desktop\看过的文章\code\FusionGAN\FusionGAN-master\model.py' to '/mnt/model.py'
[2023/9/5 20:31] Upload file 'C:\Users\Lenovo\Desktop\看过的文章\code\FusionGAN\FusionGAN-master\README.md' to '/mnt/README.md'
[2023/9/5 20:31] Upload file 'C:\Users\Lenovo\Desktop\看过的文章\code\FusionGAN\FusionGAN-master\test_one_image.py' to '/mnt/test_one_image.py'
[2023/9/5 20:31] Upload file 'C:\Users\Lenovo\Desktop\看过的文章\code\FusionGAN\FusionGAN-master\utils.py' to '/mnt/utils.py'
[2023/9/5 20:31] Upload to root@hz-t3.matpool.com:26634 password completed in 1 sec, 951 ms: 6 files transferred (25.2 kbit/s)
Click the arrow to select the server address
Appears as shown in the figure
Enter the code to enter the mnt folder and view the files in the folder through ls -a
cd ../mnt
ls -a
Unzip FusionGAN.zip through unzip (the name may be different and just modify FusionGAN.zip)
unzip FusionGAN.zip -d FusionGAN
Decompression successful
Modify the remote synchronization folder directory here, select tool-depolyment-configuration-Mappings
and change the address to the newly generated address
Re-upload the document, first click project (if upload to is grayed out, you must not have clicked project) (if you only need to update one code file, click on that code and then click upload), and then Click tool and re-upload
Return to the terminal and enter the folder through the cd command
cd FusionGAN
cd FusionGAN-master
2. Start testing
In the FusionGAN-master folder, enter through the command line
python test_one_image.py
An error occurs, indicating that a package is missing.
enter
pip install scipy==1.1.0
Package installed successfully
re-enter
python test_one_image.py
Test success
The result is saved in result
Download from tool, tool-depolyment-download form root…
You can see that it was successfully downloaded to the local
3. Carry out training
Command line input
python main.py
An error occurred
OSError: Can't write data (file write failed:)
This error is caused by the insufficient size of the network disk. Since the free memory is only 5G, the newly generated .h5 file is stored in the network disk and is much larger than 5G, resulting in data overflow. This situation can only be solved Gold, at least 50G is required to run the entire program. After purchasing more than 50G of network disk memory (, I later thought that I could move the files to the server desktop and run them from the beginning. Since I have already purchased the space, I will not try it. I haven’t fallen into the trap yet. Partners can try it)
After purchasing enough hard disk space, an error message appears:
File "main.py", line 45, in main
srcnn.train(FLAGS)
File "/mnt/FusionGAN/FusionGAN-master/model.py", line 104, in train
train_data_ir, train_label_ir = read_data(data_dir_ir)
File "/mnt/FusionGAN/FusionGAN-master/utils.py", line 31, in read_data
with h5py.File(path, 'r') as hf:
File "/root/miniconda3/envs/myconda/lib/python3.5/site-packages/h5py/_hl/files.py", line 408, in __init__
swmr=swmr)
File "/root/miniconda3/envs/myconda/lib/python3.5/site-packages/h5py/_hl/files.py", line 173, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 88, in h5py.h5f.open
OSError: Unable to open file (unable to open file: name = './checkpoint/Train_ir/train.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)
Change lines 98 and 99 of code in model.py from:
if config.is_train:
data_dir_ir = os.path.join('./{}'.format(config.checkpoint_dir), "Train_ir","train.h5")
data_dir_vi = os.path.join('./{}'.format(config.checkpoint_dir), "Train_vi","train.h5")
Change to
if config.is_train:
data_dir_ir = os.path.join("checkpoint_20", "Train_ir","train.h5")
data_dir_vi = os.path.join("checkpoint_20", "Train_vi","train.h5")
continue to generate errors
Traceback (most recent call last):
File "main.py", line 48, in <module>
tf.app.run()
File "/root/miniconda3/envs/myconda/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "main.py", line 45, in main
srcnn.train(FLAGS)
File "/mnt/FusionGAN/FusionGAN-master/model.py", line 144, in train
for ep in xrange(config.epoch):
NameError: name 'xrange' is not defined
Change lines 144 and 147 from xrange to range
for ep in range(config.epoch):
# Run by batch images
batch_idxs = len(train_data_ir) // config.batch_size
for idx in range(0, batch_idxs):
batch_images_ir = train_data_ir[idx*config.batch_size : (idx+1)*config.batch_size]
batch_labels_ir = train_label_ir[idx*config.batch_size : (idx+1)*config.batch_size]
batch_images_vi = train_data_vi[idx*config.batch_size : (idx+1)*config.batch_size]
batch_labels_vi = train_label_vi[idx*config.batch_size : (idx+1)*config.batch_size]
Start training now
The entire training lasted about one hour, and the GPU usage reached about 90% through Mochichi Cloud.
final running result
The model is saved in the checkpoint/CGAN_120 folder. You can see that the model has been updated by checking the checkpoint/CGAN_120 folder.
The contents of the previous folder
Run the test program again to compare the before and after test results, which will not be shown here. Interested friends can explore by themselves.