[Model Deployment] Getting Started Tutorial (7): TensorRT Model Construction and Reasoning

Model Deployment Getting Started Tutorial (7): TensorRT Model Construction and Reasoning - Zhihu (zhihu.com)

Table of contents

Introduction to TensorRT

Install TensorRT

Windows

Linux

model building

build directly

IR conversion model

model reasoning

Inference using the Python API

Reasoning using the C++ API

Summarize

FAQ

reference

Series Portal


The Getting Started Tutorial for Model Deployment continues to be updated! I believe that after the previous studies, everyone has a more comprehensive understanding of the intermediate representation of ONNX, but in a specific production environment, the ONNX model often needs to be converted into a model format that can be used by the specific reasoning backend. In this tutorial, we will come to know the famous reasoning backend TensorRT with you.

Introduction to TensorRT

TensorRT is a deep learning framework released by NVIDIA to run deep learning inference on its hardware. TensorRT provides quantization-aware training and offline quantization. Users can choose two optimization modes, INT8 and FP16, to apply deep learning models to production deployments of different tasks, such as video streaming, speech recognition, recommendation, fraud detection, text generation, and natural language deal with. TensorRT is highly optimized to run on NVIDIA GPUs and is probably the fastest inference engine currently running models on NVIDIA GPUs. More specific information about TensorRT can be  found on the TensorRT official website  .

Install TensorRT

Windows

By default, on a machine with an NVIDIA graphics card, install  CUDA  and  CUDNN in advance , and log in to the NVIDIA official website to download the TensorRT compressed package that is compatible with the host CUDA version.

Taking CUDA version 10.2 as an example, select  the zip package that adapts to CUDA 10.2. After the download is complete, users with conda virtual environment can switch to the virtual environment first, and then execute commands similar to the following in powershell to install and test:

cd \the\path\of\tensorrt\zip\file 
Expand-Archive TensorRT-8.2.5.1.Windows10.x86_64.cuda-10.2.cudnn8.2.zip . 
$env:TENSORRT_DIR = "$pwd\TensorRT-8.2.5.1" 
$env:path = "$env:TENSORRT_DIR\lib;" + $env:path 
pip install $env:TENSORRT_DIR\python\tensorrt-8.2.5.1-cp36-none-win_amd64.whl 
python -c "import tensorrt;print(tensorrt.__version__)" 

The above command will check the TensorRT version after installation. If the printed result is 8.2.5.1, it means that the Python package is installed successfully.

Linux

Similar to the installation in the Windows environment, CUDA  and  CUDNN are installed in advance on a machine with an NVIDIA graphics card by default, and you  can log in to the NVIDIA official website to download the TensorRT compressed package that is compatible with the host CUDA version.

Take CUDA version 10.2 as an example, select  the tar package that adapts to CUDA 10.2 , and then execute commands similar to the following to install and test:

cd /the/path/of/tensorrt/tar/gz/file 
tar -zxvf TensorRT-8.2.5.1.linux.x86_64-gnu.cuda-10.2.cudnn8.2.tar.gz 
export TENSORRT_DIR=$(pwd)/TensorRT-8.2.5.1 
export LD_LIBRARY_PATH=$TENSORRT_DIR/lib:$LD_LIBRARY_PATH 
pip install TensorRT-8.2.5.1/python/tensorrt-8.2.5.1-cp37-none-linux_x86_64.whl 
python -c "import tensorrt;print(tensorrt.__version__)" 

If the printed result is 8.2.5.1, it means that the Python package is installed successfully.

model building

We use TensorRT to generate models in two main ways:

  1. Build the network layer by layer directly through the API of TensorRT;
  2. Convert the intermediate representation model to a TensorRT model, such as converting an ONNX model to a TensorRT model.

Next, we will use these two methods to build the TensorRT model in Python and C++, and use the generated model for inference.

build directly

Using TensorRT's API to build a network layer by layer, this process is similar to using a general training framework, such as using Pytorch or TensorFlow to build a network. It should be noted that for the weight part, such as convolution or normalization layer, the weight content needs to be assigned to the TensorRT network. This article will not show it in detail, but only build a simple network that pools the input.

Build with Python API

The first is to use the Python API to directly build the TensorRT network. This method mainly uses the  tensorrt.Builder functions  create_builder_config and  create_network functions to build config and network respectively. The former is used to set the parameters such as the maximum working space of the network, and the latter is the main body of the network, which needs to be added layer by layer. content.

In addition, it is necessary to define the input and output names, serialize the constructed network, and save it as a local file. It is worth noting that if you want the network to accept input and output with different resolutions, you need to use  tensorrt.Builder the  create_optimization_profile function and set the minimum and maximum sizes.

The implementation code is as follows:

import tensorrt as trt 
 
verbose = True 
IN_NAME = 'input' 
OUT_NAME = 'output' 
IN_H = 224 
IN_W = 224 
BATCH_SIZE = 1 
 
EXPLICIT_BATCH = 1 << (int)( 
    trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH) 
 
TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE) if verbose else trt.Logger() 
with trt.Builder(TRT_LOGGER) as builder, builder.create_builder_config( 
) as config, builder.create_network(EXPLICIT_BATCH) as network: 
    # define network 
    input_tensor = network.add_input( 
        name=IN_NAME, dtype=trt.float32, shape=(BATCH_SIZE, 3, IN_H, IN_W)) 
    pool = network.add_pooling( 
        input=input_tensor, type=trt.PoolingType.MAX, window_size=(2, 2)) 
    pool.stride = (2, 2) 
    pool.get_output(0).name = OUT_NAME 
    network.mark_output(pool.get_output(0)) 
 
    # serialize the model to engine file 
    profile = builder.create_optimization_profile() 
    profile.set_shape_input('input', *[[BATCH_SIZE, 3, IN_H, IN_W]]*3)  
    builder.max_batch_size = 1 
    config.max_workspace_size = 1 << 30 
    engine = builder.build_engine(network, config) 
    with open('model_python_trt.engine', mode='wb') as f: 
        f.write(bytearray(engine.serialize())) 
        print("generating file done!") 

Build with C++ API

For small partners who want to directly use C++ language to build a network, the whole process is very similar to the above-mentioned execution process of Python. The main points to note are:

  1. nvinfer1:: createInferBuilder Corresponding to that in Python  ,  the instance of the class  tensorrt.Builderneeds to be passed in  , but  it is an abstract class, and the user needs to inherit the class and implement the internal virtual function. But here we directly use  the implementation  subclass  in the samples folder file  after decompressing the TensorRT package.ILoggerILogger../samples/common/logger.hLogger
  2. Setting the input size of the TensorRT model requires multiple calls  IOptimizationProfile ,  setDimensions which is a little more cumbersome than Python. IOptimizationProfile Functions are required  createOptimizationProfile , corresponding to Python  create_builder_config functions.

The implementation code is as follows:

#include <fstream> 
#include <iostream> 
 
#include <NvInfer.h> 
#include <../samples/common/logger.h> 
 
using namespace nvinfer1; 
using namespace sample; 
 
const char* IN_NAME = "input"; 
const char* OUT_NAME = "output"; 
static const int IN_H = 224; 
static const int IN_W = 224; 
static const int BATCH_SIZE = 1; 
static const int EXPLICIT_BATCH = 1 << (int)(NetworkDefinitionCreationFlag::kEXPLICIT_BATCH); 
 
int main(int argc, char** argv) 
{ 
        // Create builder 
        Logger m_logger; 
        IBuilder* builder = createInferBuilder(m_logger); 
        IBuilderConfig* config = builder->createBuilderConfig(); 
 
        // Create model to populate the network 
        INetworkDefinition* network = builder->createNetworkV2(EXPLICIT_BATCH); 
        ITensor* input_tensor = network->addInput(IN_NAME, DataType::kFLOAT, Dims4{ BATCH_SIZE, 3, IN_H, IN_W }); 
        IPoolingLayer* pool = network->addPoolingNd(*input_tensor, PoolingType::kMAX, DimsHW{ 2, 2 }); 
        pool->setStrideNd(DimsHW{ 2, 2 }); 
        pool->getOutput(0)->setName(OUT_NAME); 
        network->markOutput(*pool->getOutput(0)); 
 
        // Build engine 
        IOptimizationProfile* profile = builder->createOptimizationProfile(); 
        profile->setDimensions(IN_NAME, OptProfileSelector::kMIN, Dims4(BATCH_SIZE, 3, IN_H, IN_W)); 
        profile->setDimensions(IN_NAME, OptProfileSelector::kOPT, Dims4(BATCH_SIZE, 3, IN_H, IN_W)); 
        profile->setDimensions(IN_NAME, OptProfileSelector::kMAX, Dims4(BATCH_SIZE, 3, IN_H, IN_W)); 
        config->setMaxWorkspaceSize(1 << 20); 
        ICudaEngine* engine = builder->buildEngineWithConfig(*network, *config); 
 
        // Serialize the model to engine file 
        IHostMemory* modelStream{ nullptr }; 
        assert(engine != nullptr); 
        modelStream = engine->serialize(); 
 
        std::ofstream p("model.engine", std::ios::binary); 
        if (!p) { 
                std::cerr << "could not open output file to save model" << std::endl; 
                return -1; 
        } 
        p.write(reinterpret_cast<const char*>(modelStream->data()), modelStream->size()); 
        std::cout << "generating file done!" << std::endl; 
 
        // Release resources 
        modelStream->destroy(); 
        network->destroy(); 
        engine->destroy(); 
        builder->destroy(); 
        config->destroy(); 
        return 0; 
} 

IR conversion model

In addition to building the network layer by layer and serializing the model directly through TensorRT's API, TensorRT also supports converting intermediate representation models (such as ONNX) into TensorRT models.

Convert using the Python API

We first use Pytorch to implement a model consistent with the above, that is, only pool the input and output it once; then convert the Pytorch model to the ONNX model; finally convert the ONNX model to the TensorRT model.

The TensorRT function is mainly used here  OnnxParser , which can parse the ONNX model into the TensorRT network. Finally, we can also get a TensorRT model whose function is consistent with that of the model implemented in the above method.

The implementation code is as follows:

import torch 
import onnx 
import tensorrt as trt 
 
 
onnx_model = 'model.onnx' 
 
class NaiveModel(torch.nn.Module): 
    def __init__(self): 
        super().__init__() 
        self.pool = torch.nn.MaxPool2d(2, 2) 
 
    def forward(self, x): 
        return self.pool(x) 
 
device = torch.device('cuda:0') 
 
# generate ONNX model 
torch.onnx.export(NaiveModel(), torch.randn(1, 3, 224, 224), onnx_model, input_names=['input'], output_names=['output'], opset_version=11) 
onnx_model = onnx.load(onnx_model) 
 
# create builder and network 
logger = trt.Logger(trt.Logger.ERROR) 
builder = trt.Builder(logger) 
EXPLICIT_BATCH = 1 << (int)( 
    trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH) 
network = builder.create_network(EXPLICIT_BATCH) 
 
# parse onnx 
parser = trt.OnnxParser(network, logger) 
 
if not parser.parse(onnx_model.SerializeToString()): 
    error_msgs = '' 
    for error in range(parser.num_errors): 
        error_msgs += f'{parser.get_error(error)}\n' 
    raise RuntimeError(f'Failed to parse onnx, {error_msgs}') 
 
config = builder.create_builder_config() 
config.max_workspace_size = 1<<20 
profile = builder.create_optimization_profile() 
 
profile.set_shape('input', [1,3 ,224 ,224], [1,3,224, 224], [1,3 ,224 ,224]) 
config.add_optimization_profile(profile) 
# create engine 
with torch.cuda.device(device): 
    engine = builder.build_engine(network, config) 
 
with open('model.engine', mode='wb') as f: 
    f.write(bytearray(engine.serialize())) 
    print("generating file done!") 
 

During IR conversion, if multiple batches, multiple inputs, and dynamic shapes are required, they can be  set_shape set by calling the function multiple times. set_shape The parameters accepted by the function are: input node name, minimum acceptable input size, optimal input size, and maximum acceptable input size. It is generally required that the size relationship of these three dimensions is monotonically increasing.

Convert using the C++ API

After introducing how to convert ONNX model to TensorRT model with Python language, then introduce how to convert ONNX model to TensorRT model with C++. Through this  NvOnnxParser, we can directly parse the ONNX file obtained in the previous section into the network.

The implementation code is as follows:

#include <fstream> 
#include <iostream> 
 
#include <NvInfer.h> 
#include <NvOnnxParser.h> 
#include <../samples/common/logger.h> 
 
using namespace nvinfer1; 
using namespace nvonnxparser; 
using namespace sample; 
 
int main(int argc, char** argv) 
{ 
        // Create builder 
        Logger m_logger; 
        IBuilder* builder = createInferBuilder(m_logger); 
        const auto explicitBatch = 1U << static_cast<uint32_t>(NetworkDefinitionCreationFlag::kEXPLICIT_BATCH); 
        IBuilderConfig* config = builder->createBuilderConfig(); 
 
        // Create model to populate the network 
        INetworkDefinition* network = builder->createNetworkV2(explicitBatch); 
 
        // Parse ONNX file 
        IParser* parser = nvonnxparser::createParser(*network, m_logger); 
        bool parser_status = parser->parseFromFile("model.onnx", static_cast<int>(ILogger::Severity::kWARNING)); 
 
        // Get the name of network input 
        Dims dim = network->getInput(0)->getDimensions(); 
        if (dim.d[0] == -1)  // -1 means it is a dynamic model 
        { 
                const char* name = network->getInput(0)->getName(); 
                IOptimizationProfile* profile = builder->createOptimizationProfile(); 
                profile->setDimensions(name, OptProfileSelector::kMIN, Dims4(1, dim.d[1], dim.d[2], dim.d[3])); 
                profile->setDimensions(name, OptProfileSelector::kOPT, Dims4(1, dim.d[1], dim.d[2], dim.d[3])); 
                profile->setDimensions(name, OptProfileSelector::kMAX, Dims4(1, dim.d[1], dim.d[2], dim.d[3])); 
                config->addOptimizationProfile(profile); 
        } 
 
 
        // Build engine 
        config->setMaxWorkspaceSize(1 << 20); 
        ICudaEngine* engine = builder->buildEngineWithConfig(*network, *config); 
 
        // Serialize the model to engine file 
        IHostMemory* modelStream{ nullptr }; 
        assert(engine != nullptr); 
        modelStream = engine->serialize(); 
 
        std::ofstream p("model.engine", std::ios::binary); 
        if (!p) { 
                std::cerr << "could not open output file to save model" << std::endl; 
                return -1; 
        } 
        p.write(reinterpret_cast<const char*>(modelStream->data()), modelStream->size()); 
        std::cout << "generate file success!" << std::endl; 
 
        // Release resources 
        modelStream->destroy(); 
        network->destroy(); 
        engine->destroy(); 
        builder->destroy(); 
        config->destroy(); 
        return 0; 
} 
 

model reasoning

Earlier, we used two ways to build TensorRT models, and generated four TensorRT models in Python and C++ respectively. The functions of these four models are theoretically identical.

Next, we will use Python and C++ to perform inference on the generated TensorRT model.

Inference using the Python API

The first is to use the Python API to infer the TensorRT model. Some of the code here refers to  MMDeploy . Run the following code, you can find that one  1x3x224x224 tensor is input and one  1x3x112x112 tensor is output, which is exactly in line with our expectation of the result after input pooling.

from typing import Union, Optional, Sequence,Dict,Any 
 
import torch 
import tensorrt as trt 
 
class TRTWrapper(torch.nn.Module): 
    def __init__(self,engine: Union[str, trt.ICudaEngine], 
                 output_names: Optional[Sequence[str]] = None) -> None: 
        super().__init__() 
        self.engine = engine 
        if isinstance(self.engine, str): 
            with trt.Logger() as logger, trt.Runtime(logger) as runtime: 
                with open(self.engine, mode='rb') as f: 
                    engine_bytes = f.read() 
                self.engine = runtime.deserialize_cuda_engine(engine_bytes) 
        self.context = self.engine.create_execution_context() 
        names = [_ for _ in self.engine] 
        input_names = list(filter(self.engine.binding_is_input, names)) 
        self._input_names = input_names 
        self._output_names = output_names 
 
        if self._output_names is None: 
            output_names = list(set(names) - set(input_names)) 
            self._output_names = output_names 
 
    def forward(self, inputs: Dict[str, torch.Tensor]): 
        assert self._input_names is not None 
        assert self._output_names is not None 
        bindings = [None] * (len(self._input_names) + len(self._output_names)) 
        profile_id = 0 
        for input_name, input_tensor in inputs.items(): 
            # check if input shape is valid 
            profile = self.engine.get_profile_shape(profile_id, input_name) 
            assert input_tensor.dim() == len( 
                profile[0]), 'Input dim is different from engine profile.' 
            for s_min, s_input, s_max in zip(profile[0], input_tensor.shape, 
                                             profile[2]): 
                assert s_min <= s_input <= s_max, \ 
                    'Input shape should be between ' \ 
                    + f'{profile[0]} and {profile[2]}' \ 
                    + f' but get {tuple(input_tensor.shape)}.' 
            idx = self.engine.get_binding_index(input_name) 
 
            # All input tensors must be gpu variables 
            assert 'cuda' in input_tensor.device.type 
            input_tensor = input_tensor.contiguous() 
            if input_tensor.dtype == torch.long: 
                input_tensor = input_tensor.int() 
            self.context.set_binding_shape(idx, tuple(input_tensor.shape)) 
            bindings[idx] = input_tensor.contiguous().data_ptr() 
 
        # create output tensors 
        outputs = {} 
        for output_name in self._output_names: 
            idx = self.engine.get_binding_index(output_name) 
            dtype = torch.float32 
            shape = tuple(self.context.get_binding_shape(idx)) 
 
            device = torch.device('cuda') 
            output = torch.empty(size=shape, dtype=dtype, device=device) 
            outputs[output_name] = output 
            bindings[idx] = output.data_ptr() 
        self.context.execute_async_v2(bindings, 
                                      torch.cuda.current_stream().cuda_stream) 
        return outputs 
 
model = TRTWrapper('model.engine', ['output']) 
output = model(dict(input = torch.randn(1, 3, 224, 224).cuda())) 
print(output) 

Reasoning using the C++ API

Finally, in many actual production environments, we will use C++ language to complete specific tasks to achieve more efficient code running effects. In addition, TensoRT users generally value its use under C++, so we also use C++ language Implement model reasoning again, which can also be compared with using the Python API to reason about the model.

The implementation code is as follows:

#include <fstream> 
#include <iostream> 
 
#include <NvInfer.h> 
#include <../samples/common/logger.h> 
 
#define CHECK(status) \ 
    do\ 
    {\ 
        auto ret = (status);\ 
        if (ret != 0)\ 
        {\ 
            std::cerr << "Cuda failure: " << ret << std::endl;\ 
            abort();\ 
        }\ 
    } while (0) 
 
using namespace nvinfer1; 
using namespace sample; 
 
const char* IN_NAME = "input"; 
const char* OUT_NAME = "output"; 
static const int IN_H = 224; 
static const int IN_W = 224; 
static const int BATCH_SIZE = 1; 
static const int EXPLICIT_BATCH = 1 << (int)(NetworkDefinitionCreationFlag::kEXPLICIT_BATCH); 
 
 
void doInference(IExecutionContext& context, float* input, float* output, int batchSize) 
{ 
        const ICudaEngine& engine = context.getEngine(); 
 
        // Pointers to input and output device buffers to pass to engine. 
        // Engine requires exactly IEngine::getNbBindings() number of buffers. 
        assert(engine.getNbBindings() == 2); 
        void* buffers[2]; 
 
        // In order to bind the buffers, we need to know the names of the input and output tensors. 
        // Note that indices are guaranteed to be less than IEngine::getNbBindings() 
        const int inputIndex = engine.getBindingIndex(IN_NAME); 
        const int outputIndex = engine.getBindingIndex(OUT_NAME); 
 
        // Create GPU buffers on device 
        CHECK(cudaMalloc(&buffers[inputIndex], batchSize * 3 * IN_H * IN_W * sizeof(float))); 
        CHECK(cudaMalloc(&buffers[outputIndex], batchSize * 3 * IN_H * IN_W /4 * sizeof(float))); 
 
        // Create stream 
        cudaStream_t stream; 
        CHECK(cudaStreamCreate(&stream)); 
 
        // DMA input batch data to device, infer on the batch asynchronously, and DMA output back to host 
        CHECK(cudaMemcpyAsync(buffers[inputIndex], input, batchSize * 3 * IN_H * IN_W * sizeof(float), cudaMemcpyHostToDevice, stream)); 
        context.enqueue(batchSize, buffers, stream, nullptr); 
        CHECK(cudaMemcpyAsync(output, buffers[outputIndex], batchSize * 3 * IN_H * IN_W / 4 * sizeof(float), cudaMemcpyDeviceToHost, stream)); 
        cudaStreamSynchronize(stream); 
 
        // Release stream and buffers 
        cudaStreamDestroy(stream); 
        CHECK(cudaFree(buffers[inputIndex])); 
        CHECK(cudaFree(buffers[outputIndex])); 
} 
 
int main(int argc, char** argv) 
{ 
        // create a model using the API directly and serialize it to a stream 
        char *trtModelStream{ nullptr }; 
        size_t size{ 0 }; 
 
        std::ifstream file("model.engine", std::ios::binary); 
        if (file.good()) { 
                file.seekg(0, file.end); 
                size = file.tellg(); 
                file.seekg(0, file.beg); 
                trtModelStream = new char[size]; 
                assert(trtModelStream); 
                file.read(trtModelStream, size); 
                file.close(); 
        } 
 
        Logger m_logger; 
        IRuntime* runtime = createInferRuntime(m_logger); 
        assert(runtime != nullptr); 
        ICudaEngine* engine = runtime->deserializeCudaEngine(trtModelStream, size, nullptr); 
        assert(engine != nullptr); 
        IExecutionContext* context = engine->createExecutionContext(); 
        assert(context != nullptr); 
 
        // generate input data 
        float data[BATCH_SIZE * 3 * IN_H * IN_W]; 
        for (int i = 0; i < BATCH_SIZE * 3 * IN_H * IN_W; i++) 
                data[i] = 1; 
 
        // Run inference 
        float prob[BATCH_SIZE * 3 * IN_H * IN_W /4]; 
        doInference(*context, data, prob, BATCH_SIZE); 
 
        // Destroy the engine 
        context->destroy(); 
        engine->destroy(); 
        runtime->destroy(); 
        return 0; 
} 

Summarize

Through the study of this article, we have mastered two ways to build the TensorRT model: directly build the network layer by layer through the TensorRT API; convert the intermediate representation model into the TensorRT model. Not only that, we also completed the construction and reasoning of the TensorRT model in C++ and Python respectively. I believe everyone has gained something! In the next article, we will learn how to add TensorRT custom operators with you, so stay tuned~

FAQ

  • Q : An error is reported when running the code: Could not find: cudnn64_8.dll. Is it on your PATH?
  • A: First check whether your environment variable contains the path of cudnn64_8.dll. If you find that the path of cudnn is in C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v10.2\\bin, But there is only cudnn64_7.dll in it. The solution is to download cuDNN zip package from NVIDIA official website, unzip it, and copy cudnn64_8.dll to the bin directory of CUDA Toolkit. At this time, you can also copy a copy of cudnn64_7.dll, and then rename the copied copy to cudnn64_8.dll, which can also solve this problem.

reference

GitHub - wang-xinyu/tensorrtx: Implementation of popular deep learning networks with TensorRT networ

GitHub - NVIDIA/TensorRT: TensorRT is a C++ library for high performance inference on NVIDIA GPUs an

Series Portal

OpenMMLab: Introduction to Model Deployment (1): Introduction to Model Deployment 172 Agree 22 Comments 213 Agree 24 Comments 241 Agree 31 Comments 498 Agree 50 Comments are uploading...ReuploadCancel

OpenMMLab: Introduction to Model Deployment Tutorial (2): Solving the Difficulties in Model Deployment

OpenMMLab: Model Deployment Introduction Tutorial (3): PyTorch to ONNX Detailed Explanation 131 Agree 25 Comments 146 Agree 32 Comments 153 Agree 34 Comments 315 Agree 54 Comments are uploading...ReuploadCancel

OpenMMLab: Introduction to Model Deployment Tutorial (4): Supporting More ONNX Operators in PyTorch

OpenMMLab: Introduction to Model Deployment Tutorial (5): Modification and Debugging of ONNX Model 86 Agreed 4 Comments 115 Agreed 10 Comments 122 Agreed 16 Comments 217 Agreed 25 Comments Uploading...Reupload Cancel

OpenMMLab: Model Deployment Getting Started Tutorial (6): Implementing PyTorch-ONNX Precision Alignment Tool 129 Agreed· 10 Comments The article is uploading...ReuploadCancel

OpenMMLab: Interpretation of TorchScript (1): Getting to know TorchScript83 Agree· 10 Comments 106 Agree· 15 Comments 118 Agree· 15 Comments 239 Agree· 21 Comments uploading...ReuploadCancel

OpenMMLab: Interpretation of TorchScript (2): Torch jit tracer implementation analysis

OpenMMLab: Interpretation of TorchScript (3): subgraph rewriter in jit

OpenMMLab: Interpretation of TorchScript (4): Alias ​​Analysis in Torch jit

Guess you like

Origin blog.csdn.net/qq_43456016/article/details/130264255