Nvidia TensorRT


当你训练获得模型后,便需要做在线预测inference。这时你系统的延迟和吞吐是关键。

nvidia推出的TensorRT就是用来做inferecne加速的,他们对几个典型的网络做了针对性的优化,比如将Conv、BN、Relu三个操作变成一个操作(fuse network layers)等。



在速度上,单张图的速度提升3倍,16张图的速度提升0.7倍。 



有人实现了基于tensorRT的图像识别,代码在 https://github.com/dusty-nv/jetson-inference.git, 我简单测试了一张920x920的图,load image (CPU)耗时42毫秒,预测(GPU)只需5毫秒, 输出如下。 和caffe原生的“examples/cpp_classification/classification.bin”对比测试,需要22毫秒。

[maofeng@gpu /home/maofeng/tensorrt/jetson-inference/build/x86_64/bin]
$./imagenet-console banana_0.jpg 
imagenet-console
  args (2):  0 [./imagenet-console]  1 [banana_0.jpg]  

[GIE]  attempting to open cache file networks/bvlc_googlenet.caffemodel.2.tensorcache
[GIE]  cache file not found, profiling network model
[GIE]  platform does not have FP16 support.
[GIE]  loading networks/googlenet.prototxt networks/bvlc_googlenet.caffemodel
[GIE]  retrieved output tensor 'prob'
[GIE]  configuring CUDA engine
[GIE]  building CUDA engine
[GIE]  completed building CUDA engine
[GIE]  network profiling complete, writing cache to networks/bvlc_googlenet.caffemodel.2.tensorcache
[GIE]  completed writing cache to networks/bvlc_googlenet.caffemodel.2.tensorcache
[GIE]  networks/bvlc_googlenet.caffemodel loaded
[GIE]  CUDA engine context initialized with 2 bindings
[GIE]  networks/bvlc_googlenet.caffemodel input  binding index:  0
[GIE]  networks/bvlc_googlenet.caffemodel input  dims (b=2 c=3 h=224 w=224) size=1204224
[cuda]  cudaAllocMapped 1204224 bytes, CPU 0x204500000 GPU 0x204500000
[GIE]  networks/bvlc_googlenet.caffemodel output 0 prob  binding index:  1
[GIE]  networks/bvlc_googlenet.caffemodel output 0 prob  dims (b=2 c=1000 h=1 w=1) size=8000
[cuda]  cudaAllocMapped 8000 bytes, CPU 0x204640000 GPU 0x204640000
networks/bvlc_googlenet.caffemodel initialized.
[GIE]  googlenet loaded
imageNet -- loaded 1000 class info entries
googlenet initialized.
loaded image  banana_0.jpg  (920 x 920)  13542400 bytes
[cuda]  cudaAllocMapped 13542400 bytes, CPU 0x204740000 GPU 0x204740000
[GIE]  layer conv1/7x7_s2 + conv1/relu_7x7 - 0.760960 ms
[GIE]  layer pool1/3x3_s2 - 0.039424 ms
[GIE]  layer pool1/norm1 - 0.046944 ms
[GIE]  layer conv2/3x3_reduce + conv2/relu_3x3_reduce - 0.034752 ms
[GIE]  layer conv2/3x3 + conv2/relu_3x3 - 0.114464 ms
[GIE]  layer conv2/norm2 - 0.082400 ms
[GIE]  layer pool2/3x3_s2 - 0.025760 ms
[GIE]  layer inception_3a/1x1 + inception_3a/relu_1x1 - 0.025248 ms
[GIE]  layer inception_3a/pool - 0.013440 ms
[GIE]  layer inception_3a/3x3_reduce + inception_3a/relu_3x3_reduce||inception_3a/5x5_reduce + inception_3a/relu_5x5_reduce - 0.060128 ms
[GIE]  layer inception_3a/pool_proj + inception_3a/relu_pool_proj - 0.048096 ms
[GIE]  layer inception_3a/3x3 + inception_3a/relu_3x3 - 0.052256 ms
[GIE]  layer inception_3a/5x5 + inception_3a/relu_5x5 - 0.023680 ms
[GIE]  layer inception_3a/output - 0.007168 ms
[GIE]  layer inception_3b/1x1 + inception_3b/relu_1x1 - 0.067552 ms
[GIE]  layer inception_3b/pool - 0.016480 ms
[GIE]  layer inception_3b/3x3_reduce + inception_3b/relu_3x3_reduce||inception_3b/5x5_reduce + inception_3b/relu_5x5_reduce - 0.068224 ms
[GIE]  layer inception_3b/pool_proj + inception_3b/relu_pool_proj - 0.032832 ms
[GIE]  layer inception_3b/3x3 + inception_3b/relu_3x3 - 0.069856 ms
[GIE]  layer inception_3b/5x5 + inception_3b/relu_5x5 - 0.053024 ms
[GIE]  layer inception_3b/output - 0.001408 ms
[GIE]  layer pool3/3x3_s2 - 0.020864 ms
[GIE]  layer inception_4a/1x1 + inception_4a/relu_1x1 - 0.038368 ms
[GIE]  layer inception_4a/pool - 0.011520 ms
[GIE]  layer inception_4a/3x3_reduce + inception_4a/relu_3x3_reduce||inception_4a/5x5_reduce + inception_4a/relu_5x5_reduce - 0.028128 ms
[GIE]  layer inception_4a/pool_proj + inception_4a/relu_pool_proj - 0.032704 ms
[GIE]  layer inception_4a/3x3 + inception_4a/relu_3x3 - 0.045248 ms
[GIE]  layer inception_4a/5x5 + inception_4a/relu_5x5 - 0.017728 ms
[GIE]  layer inception_4a/output - 0.001376 ms
[GIE]  layer inception_4b/1x1 + inception_4b/relu_1x1 - 0.037760 ms
[GIE]  layer inception_4b/pool - 0.014528 ms
[GIE]  layer inception_4b/3x3_reduce + inception_4b/relu_3x3_reduce||inception_4b/5x5_reduce + inception_4b/relu_5x5_reduce - 0.031712 ms
[GIE]  layer inception_4b/pool_proj + inception_4b/relu_pool_proj - 0.028864 ms
[GIE]  layer inception_4b/3x3 + inception_4b/relu_3x3 - 0.049984 ms
[GIE]  layer inception_4b/5x5 + inception_4b/relu_5x5 - 0.022880 ms
[GIE]  layer inception_4b/output - 0.001568 ms
[GIE]  layer inception_4c/1x1 + inception_4c/relu_1x1 - 0.032192 ms
[GIE]  layer inception_4c/pool - 0.015520 ms
[GIE]  layer inception_4c/3x3_reduce + inception_4c/relu_3x3_reduce||inception_4c/5x5_reduce + inception_4c/relu_5x5_reduce - 0.033920 ms
[GIE]  layer inception_4c/pool_proj + inception_4c/relu_pool_proj - 0.029056 ms
[GIE]  layer inception_4c/3x3 + inception_4c/relu_3x3 - 0.049184 ms
[GIE]  layer inception_4c/5x5 + inception_4c/relu_5x5 - 0.023872 ms
[GIE]  layer inception_4c/output - 0.001536 ms
[GIE]  layer inception_4d/1x1 + inception_4d/relu_1x1 - 0.035872 ms
[GIE]  layer inception_4d/pool - 0.011008 ms
[GIE]  layer inception_4d/3x3_reduce + inception_4d/relu_3x3_reduce||inception_4d/5x5_reduce + inception_4d/relu_5x5_reduce - 0.037824 ms
[GIE]  layer inception_4d/pool_proj + inception_4d/relu_pool_proj - 0.031584 ms
[GIE]  layer inception_4d/3x3 + inception_4d/relu_3x3 - 0.052544 ms
[GIE]  layer inception_4d/5x5 + inception_4d/relu_5x5 - 0.021536 ms
[GIE]  layer inception_4d/output - 0.007072 ms
[GIE]  layer inception_4e/1x1 + inception_4e/relu_1x1 - 0.049760 ms
[GIE]  layer inception_4e/pool - 0.011392 ms
[GIE]  layer inception_4e/3x3_reduce + inception_4e/relu_3x3_reduce||inception_4e/5x5_reduce + inception_4e/relu_5x5_reduce - 0.035776 ms
[GIE]  layer inception_4e/pool_proj + inception_4e/relu_pool_proj - 0.040832 ms
[GIE]  layer inception_4e/3x3 + inception_4e/relu_3x3 - 0.058048 ms
[GIE]  layer inception_4e/5x5 + inception_4e/relu_5x5 - 0.029600 ms
[GIE]  layer inception_4e/output - 0.001376 ms
[GIE]  layer pool4/3x3_s2 - 0.016000 ms
[GIE]  layer inception_5a/1x1 + inception_5a/relu_1x1 - 0.042528 ms
[GIE]  layer inception_5a/pool - 0.011488 ms
[GIE]  layer inception_5a/3x3_reduce + inception_5a/relu_3x3_reduce||inception_5a/5x5_reduce + inception_5a/relu_5x5_reduce - 0.037088 ms
[GIE]  layer inception_5a/pool_proj + inception_5a/relu_pool_proj - 0.040896 ms
[GIE]  layer inception_5a/3x3 + inception_5a/relu_3x3 - 0.057024 ms
[GIE]  layer inception_5a/5x5 + inception_5a/relu_5x5 - 0.020128 ms
[GIE]  layer inception_5a/output - 0.001376 ms
[GIE]  layer inception_5b/1x1 + inception_5b/relu_1x1 - 0.045248 ms
[GIE]  layer inception_5b/pool - 0.017312 ms
[GIE]  layer inception_5b/3x3_reduce + inception_5b/relu_3x3_reduce||inception_5b/5x5_reduce + inception_5b/relu_5x5_reduce - 0.037760 ms
[GIE]  layer inception_5b/pool_proj + inception_5b/relu_pool_proj - 0.035872 ms
[GIE]  layer inception_5b/3x3 + inception_5b/relu_3x3 - 0.064608 ms
[GIE]  layer inception_5b/5x5 + inception_5b/relu_5x5 - 0.026592 ms
[GIE]  layer inception_5b/output - 0.001760 ms
[GIE]  layer pool5/7x7_s1 - 0.010912 ms
[GIE]  layer loss3/classifier - 0.174848 ms
[GIE]  layer prob - 0.034240 ms
[GIE]  layer network time - 3.312513 ms
class 0954 - 0.998960  (banana)
cost 42045 4643
imagenet-console:  'banana_0.jpg' -> 99.89600% class #954 (banana)

shutting down...


猜你喜欢

转载自blog.csdn.net/mao_feng/article/details/70243997