I saw some articles on the Internet comparing the operating efficiency of ONNX Runtime and PyTorch. Many people's running results show that ONNX can improve the operating efficiency several times. So is it so magical? Let me give it a try.
System: Ubuntu22.04
CPU: Intel8750H
Graphics card: 3060
The most commonly used ResNet50 for model selection
import torch
import torchvision.models as models
# import
model = models.resnet50(pretrained=True)
First export the model to onnx
# PyTorch model
torch.save(model, 'resnet.pth')
# random input
data = torch.rand(1,3,224,224)
# ONNX needs data example
torch.onnx.export(model, data, 'resnet.onnx')
Then load these two models
import onnxruntime
# PyTorch model
torch_model = torch.load('resnet.pth')
# ONNX model
onnx_model = onnxruntime.InferenceSession('resnet.onnx')
#torch_model.to("cuda:0")
Run the model multiple times and calculate the average time.
from timeit import timeit
import numpy as np
data = np.random.rand(1,3,224,224).astype(np.float32)
torch_data = torch.from_numpy(data)
def torch_inf():
torch_model(torch_data)
def onnx_inf():
onnx_model.run(None,{
onnx_model.get_inputs()[0].name: data
})
n = 200
#warmup
#for i in range(1,100):
# torch_inf()
torch_t = timeit(lambda : torch_inf(), number=n)/n
onnx_t = timeit(lambda : onnx_inf(), number=n)/n
print(f"PyTorch {torch_t} VS ONNX {onnx_t}")
The effect obtained is like this
PyTorch 0.12086693297999773 VS ONNX 0.005529450080002789
Here, we can see that the running time of ONNX is much smaller than PyTorch.
However, there is an important issue here. Many articles ignore model .eval() when testing like this . This will allow some layers in the model that are not used for inference to be executed. So, we add in the code
torch_model.eval()
The subsequent output becomes
PyTorch 0.09768170792000092 VS ONNX 0.006109018884999386
As you can see, the efficiency difference is not that huge. So what if we use GPU for comparison?
I changed onnxruntime to onnxruntime-gpu, and then changed the code slightly
import onnxruntime
# PyTorch model
torch_model = torch.load('resnet.pth')
# ONNX model
onnx_model = onnxruntime.InferenceSession('resnet.onnx')
torch_model.to("cuda:0")
torch_data = torch.from_numpy(data).to("cuda:0")
torch_model.eval()
The output at this time is
PyTorch 0.0061957750350029525 VS ONNX 0.006347028439995484
Then, in the GPU environment, the time of using onnxruntime is basically the same as that of PyTorch.
The article is reproduced from the self-cultivation of a machine vision engineer_C#, RealSense+CSharp, onnxruntime-CSDN blog
as a learning record