NN加速系列三|TensorRT- trtexec评测网络模型

简介

trtexec.readme
阐述功能

  • It’s useful for benchmarking networks on random data.对网络模型测试推理表现。
  • It’s useful for generating serialized engines from models.对指定网络模型产生序列化引擎。

共有三个例程

加载模型文件及其权重并保存引擎

加载GoogleNet并保存引擎
E:\Speed_up\TensorRT-5.1.5.0\bin\trtexec.exe --deploy=E:\Speed_up\TensorRT-5.1.5.0\data\googlenet\googlenet.prototxt --model=E:\Speed_up\TensorRT-5.1.5.0\data\googlenet\googlenet.caffemodel --output=prob --batch=16 --saveEngine=E:\Speed_up\TensorRT-5.1.5.0\data\googlenet\mnist.trt

窗口中打印出了网络执行10次前向推理的耗时

将生成的引擎用于基准测试

E:\Speed_up\TensorRT-5.1.5.0\bin\trtexec.exe --loadEngine=E:\Speed_up\TensorRT-5.1.5.0\data\googlenet\mnist.trt --batch=16

窗口中打印出了网络执行10次前向推理的耗时

使用FP16/int8计算

本机显卡性能为:GeForce GTX 950M 对应计算能力:5.0
在进行引擎推理时 加上参数--fp16 / int8
--fp16下
[I] Average over 10 runs is 48.4361 ms (host walltime is 50.252 ms, 99% percentile time is 50.1577).
[I] Average over 10 runs is 46.9483 ms (host walltime is 47.3756 ms, 99% percentile time is 47.2867).
[I] Average over 10 runs is 46.9146 ms (host walltime is 47.3432 ms, 99% percentile time is 47.2326).
[I] Average over 10 runs is 46.9561 ms (host walltime is 47.3999 ms, 99% percentile time is 47.2245).
[I] Average over 10 runs is 46.9907 ms (host walltime is 47.4733 ms, 99% percentile time is 47.0837).
[I] Average over 10 runs is 47.0208 ms (host walltime is 47.4603 ms, 99% percentile time is 47.412).
[I] Average over 10 runs is 47.0519 ms (host walltime is 47.5603 ms, 99% percentile time is 47.3422).
[I] Average over 10 runs is 47.0578 ms (host walltime is 47.5569 ms, 99% percentile time is 47.3402).
[I] Average over 10 runs is 47.0007 ms (host walltime is 48.5914 ms, 99% percentile time is 47.2236).
[I] Average over 10 runs is 47.1327 ms (host walltime is 49.0836 ms, 99% percentile time is 47.4973).
--int8下
[I] Average over 10 runs is 48.3438 ms (host walltime is 48.8078 ms, 99% percentile time is 50.5598).
[I] Average over 10 runs is 46.9473 ms (host walltime is 47.3721 ms, 99% percentile time is 47.3528).
[I] Average over 10 runs is 47.0019 ms (host walltime is 47.4794 ms, 99% percentile time is 47.0948).
[I] Average over 10 runs is 47.0335 ms (host walltime is 47.4821 ms, 99% percentile time is 47.3957).
[I] Average over 10 runs is 46.9847 ms (host walltime is 47.4697 ms, 99% percentile time is 47.0352).
[I] Average over 10 runs is 47.018 ms (host walltime is 51.2739 ms, 99% percentile time is 47.2184).
[I] Average over 10 runs is 47.0777 ms (host walltime is 48.3681 ms, 99% percentile time is 47.4442).
[I] Average over 10 runs is 47.0309 ms (host walltime is 47.6396 ms, 99% percentile time is 47.3549).
[I] Average over 10 runs is 47.0289 ms (host walltime is 47.5107 ms, 99% percentile time is 47.2661).
[I] Average over 10 runs is 47.0261 ms (host walltime is 47.4977 ms, 99% percentile time is 47.2532).
加速并不是很明显,有可能因为NVIDIA GPU系列不都支持FP16或者说FP16模式不都有加速效果。

猜你喜欢

转载自www.cnblogs.com/zy-ss-pku-cn/p/12607479.html