foreword
Openvino's documentation is too small, and the writing is quite vague, including its examples. After searching for a long time on Google, I couldn't find a problem similar to mine, so I will record it here to help future generations, how to use openvino to infer the multi-input and multi-output model.
This article will use the python binding of openvino runtime to convert the model into the correct format first, and then infer.
model example
First of all, let’s popularize some basic knowledge. Openvino currently supports direct loading of onnx models. You can also convert onnx models into xml through openvino’s model optimizer and then load them. If you are a pytorch model, please convert to onnx first. In addition, during inference, openvino's gpu inference currently does not support dynamic loading , only cpu supports it, that is to say, your vector size must be fixed and cannot be changed. Unfortunately, the model in this example is really loaded dynamically.
Open our onnx model with netron software to see the input and output:
the input has 6 vectors, src, r1i, r2i, r3i, r4i, r5i, downsample_ration.
The output also has six vectors, fgr, pha, r1o, r2o, r3o, r4o. The special feature of the model in this example is that r1-r4 is a circular memory vector, which is changing, that is to say, the size cannot be fixed.
model conversion
This article assumes that you have already installed the openvino toolkit. Then enter on the command line:
mo --input_model 你的模型.onnx
--input src,r1i,r2i,r3i,r4i,downsample_ratio
--input_shape (1,3,720,1280),(1,?,?,?),(1,?,?,?),(1,?,?,?),(1,?,?,?),(1)
--output fgr,pha,r1o,r2o,r3o,r4o
--data_type FP32
Pay attention to the writing in input and output here. Multiple input and output should be separated by commas and the names should be specified correctly. At the same time, the shape of the vector in input_shape needs to be fixed by you. In this example, since r1-r4 cannot be fixed, only the first and last vectors are fixed, and the others are filled with question marks. data_type specifies the accuracy of the model. In this example, FP32 is specified. Generally, FP32 will run faster for CPU.
After no error was reported, the conversion was completed smoothly. There are a few more files. Just need xml.
Infer the code core
This example assumes you have the openvino runtime installed.
from openvino.runtime import Core
ie = Core()
model_openvino = "xxxxx.xml"
model = ie.read_model(model=model_openvino)
compiled_model = ie.compile_model(model=model, device_name="CPU")
x = iter(compiled_model.inputs)
for i in x:
print(i)
y = iter(compiled_model.outputs)
for i in y:
print(i)
The first three lines of code read the model, and the fourth line of code loads the model, where device_name specifies the cpu or gpu to run. In this example, because of the dynamic size, the cpu must be used, otherwise an error will be reported when loading.
The next few lines of code will print out all input layers and output layers, which are used for inspection.
The output is as follows:
Next formally infer:
infer_request = compiled_model({
'src': src,
'r1i': rec[0],
'r2i': rec[1],
'r3i': rec[2],
'r4i': rec[3],
'downsample_ratio': downsample_ratio
})
fgr,pha,*rec= infer_request.values()
When inputting, pay attention to corresponding the vector size. The infer_request is a dictionary. Because there are multiple outputs, you only need to use the values() method to directly obtain the values in the order of the dictionary.
In addition, *rec here refers to storing the remaining four circular memory vectors in the rec array, and then inferring them as inputs, so it is a dynamic size, because these four vectors are empty during initialization, and they will have a fixed shape after inferring once, which is currently not supported in openvino's gpu inference. Anyway, the most important thing in this example is how to infer multiple input and multiple output. It took me a long time to figure it out.