openvino MIMO dynamic size sample record

foreword

Openvino's documentation is too small, and the writing is quite vague, including its examples. After searching for a long time on Google, I couldn't find a problem similar to mine, so I will record it here to help future generations, how to use openvino to infer the multi-input and multi-output model.
This article will use the python binding of openvino runtime to convert the model into the correct format first, and then infer.

model example

First of all, let’s popularize some basic knowledge. Openvino currently supports direct loading of onnx models. You can also convert onnx models into xml through openvino’s model optimizer and then load them. If you are a pytorch model, please convert to onnx first. In addition, during inference, openvino's gpu inference currently does not support dynamic loading , only cpu supports it, that is to say, your vector size must be fixed and cannot be changed. Unfortunately, the model in this example is really loaded dynamically.
Open our onnx model with netron software to see the input and output:
insert image description here
the input has 6 vectors, src, r1i, r2i, r3i, r4i, r5i, downsample_ration.
insert image description here
The output also has six vectors, fgr, pha, r1o, r2o, r3o, r4o. The special feature of the model in this example is that r1-r4 is a circular memory vector, which is changing, that is to say, the size cannot be fixed.

model conversion

This article assumes that you have already installed the openvino toolkit. Then enter on the command line:

mo --input_model 你的模型.onnx 
--input src,r1i,r2i,r3i,r4i,downsample_ratio 
--input_shape (1,3,720,1280),(1,?,?,?),(1,?,?,?),(1,?,?,?),(1,?,?,?),(1) 
--output fgr,pha,r1o,r2o,r3o,r4o
--data_type FP32

Pay attention to the writing in input and output here. Multiple input and output should be separated by commas and the names should be specified correctly. At the same time, the shape of the vector in input_shape needs to be fixed by you. In this example, since r1-r4 cannot be fixed, only the first and last vectors are fixed, and the others are filled with question marks. data_type specifies the accuracy of the model. In this example, FP32 is specified. Generally, FP32 will run faster for CPU.
insert image description here
After no error was reported, the conversion was completed smoothly. There are a few more files. Just need xml.

Infer the code core

This example assumes you have the openvino runtime installed.

from openvino.runtime import Core
ie = Core()
model_openvino = "xxxxx.xml"
model = ie.read_model(model=model_openvino)
compiled_model = ie.compile_model(model=model, device_name="CPU")
x = iter(compiled_model.inputs)
for i in x:
    print(i)
y = iter(compiled_model.outputs)
for i in y:
    print(i)

The first three lines of code read the model, and the fourth line of code loads the model, where device_name specifies the cpu or gpu to run. In this example, because of the dynamic size, the cpu must be used, otherwise an error will be reported when loading.
The next few lines of code will print out all input layers and output layers, which are used for inspection.
The output is as follows:
insert image description here
Next formally infer:

infer_request = compiled_model({
    
    
        'src': src, 
        'r1i': rec[0], 
        'r2i': rec[1], 
        'r3i': rec[2], 
        'r4i': rec[3], 
        'downsample_ratio': downsample_ratio
    })
fgr,pha,*rec= infer_request.values()

When inputting, pay attention to corresponding the vector size. The infer_request is a dictionary. Because there are multiple outputs, you only need to use the values() method to directly obtain the values ​​​​in the order of the dictionary.
In addition, *rec here refers to storing the remaining four circular memory vectors in the rec array, and then inferring them as inputs, so it is a dynamic size, because these four vectors are empty during initialization, and they will have a fixed shape after inferring once, which is currently not supported in openvino's gpu inference. Anyway, the most important thing in this example is how to infer multiple input and multiple output. It took me a long time to figure it out.

Guess you like

Origin blog.csdn.net/weixin_43945848/article/details/126744427