Which one is faster, ONNXRuntime or PyTorch?

I saw some articles on the Internet comparing the operating efficiency of ONNX Runtime and PyTorch. Many people's running results show that ONNX can improve the operating efficiency several times. So is it so magical? Let me give it a try.

        System: Ubuntu22.04

        CPU:    Intel8750H

        Graphics card: 3060

        The most commonly used ResNet50 for model selection
 

import torch
import torchvision.models as models
 
# import
model = models.resnet50(pretrained=True)

First export the model to onnx

 

# PyTorch model
torch.save(model, 'resnet.pth')
 
# random input
data = torch.rand(1,3,224,224)
# ONNX needs data example
torch.onnx.export(model, data, 'resnet.onnx')

 

  Then load these two models

import onnxruntime
 
# PyTorch model
torch_model = torch.load('resnet.pth')
# ONNX model
onnx_model = onnxruntime.InferenceSession('resnet.onnx')
#torch_model.to("cuda:0")

 

Run the model multiple times and calculate the average time.

from timeit import timeit
import numpy as np
 
data = np.random.rand(1,3,224,224).astype(np.float32)
torch_data = torch.from_numpy(data)
 
 
def torch_inf():
    torch_model(torch_data)
 
def onnx_inf():
    onnx_model.run(None,{
                onnx_model.get_inputs()[0].name: data
           })
n = 200
 
#warmup
#for i in range(1,100):
#    torch_inf()
torch_t = timeit(lambda : torch_inf(), number=n)/n
onnx_t = timeit(lambda : onnx_inf(), number=n)/n
 
print(f"PyTorch {torch_t} VS ONNX {onnx_t}")

 

The effect obtained is like this

PyTorch 0.12086693297999773 VS ONNX 0.005529450080002789

Here, we can see that the running time of ONNX is much smaller than PyTorch.

       

However, there is an important issue here. Many articles ignore model .eval() when testing like this . This will allow some layers in the model that are not used for inference to be executed. So, we add in the code

torch_model.eval()

 The subsequent output becomes

PyTorch 0.09768170792000092 VS ONNX 0.006109018884999386

       As you can see, the efficiency difference is not that huge. So what if we use GPU for comparison?

        I changed onnxruntime to onnxruntime-gpu, and then changed the code slightly

import onnxruntime
 
# PyTorch model
torch_model = torch.load('resnet.pth')
# ONNX model
onnx_model = onnxruntime.InferenceSession('resnet.onnx')
torch_model.to("cuda:0")
torch_data = torch.from_numpy(data).to("cuda:0")
torch_model.eval()

 The output at this time is

PyTorch 0.0061957750350029525 VS ONNX 0.006347028439995484

Then, in the GPU environment, the time of using onnxruntime is basically the same as that of PyTorch. 

The article is reproduced from the self-cultivation of a machine vision engineer_C#, RealSense+CSharp, onnxruntime-CSDN blog

 as a learning record

Guess you like

Origin blog.csdn.net/weixin_45303602/article/details/132642377