Ubuntu22.04系统,RK3588开发板,我按照官方提供的方法,将自己训练的模型转为
.rknn
格式之后,通过交叉编译将程序部署到板端,运行后虽然成功推理出来了图像和分割结果,但中途报了个错误:Failed to call RockChipRga interface, please use ‘dmesg’ command to view driver error log.
1.完整的报错信息
全部的报错信息如下:
load lable ./model/coco_80_labels_list.txt
model input num: 1, output num: 13
input tensors:
index=0, name=images, n_dims=4, dims=[1, 640, 640, 3], n_elems=1228800, size=1228800, fmt=NHWC, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003922
output tensors:
index=0, name=587, n_dims=4, dims=[1, 64, 80, 80], n_elems=409600, size=409600, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=19, scale=0.103071
index=1, name=onnx::ReduceSum_595, n_dims=4, dims=[1, 2, 80, 80], n_elems=12800, size=12800, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003492
index=2, name=600, n_dims=4, dims=[1, 1, 80, 80], n_elems=6400, size=6400, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003492
index=3, name=566, n_dims=4, dims=[1, 32, 80, 80], n_elems=204800, size=204800, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=11, scale=0.019777
index=4, name=607, n_dims=4, dims=[1, 64, 40, 40], n_elems=102400, size=102400, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=11, scale=0.102898
index=5, name=onnx::ReduceSum_615, n_dims=4, dims=[1, 2, 40, 40], n_elems=3200, size=3200, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003167
index=6, name=619, n_dims=4, dims=[1, 1, 40, 40], n_elems=1600, size=1600, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003167
index=7, name=573, n_dims=4, dims=[1, 32, 40, 40], n_elems=51200, size=51200, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=47, scale=0.027842
index=8, name=626, n_dims=4, dims=[1, 64, 20, 20], n_elems=25600, size=25600, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=32, scale=0.081311
index=9, name=onnx::ReduceSum_634, n_dims=4, dims=[1, 2, 20, 20], n_elems=800, size=800, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.002693
index=10, name=638, n_dims=4, dims=[1, 1, 20, 20], n_elems=400, size=400, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.002693
index=11, name=580, n_dims=4, dims=[1, 32, 20, 20], n_elems=12800, size=12800, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-2, scale=0.048666
index=12, name=559, n_dims=4, dims=[1, 32, 160, 160], n_elems=819200, size=819200, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-119, scale=0.031364
model is NHWC input fmt
model input height=640, width=640, channel=3
scale=1.000000 dst_box=(0 140 639 499) allow_slight_change=1 _left_offset=0 _top_offset=140 padding_w=0 padding_h=280
rga_api version 1.10.1_[0]
fill dst image (x y w h)=(0 0 640 640) with color=0x72727272
RgaCollorFill(1819) RGA_COLORFILL fail: Invalid argument
RgaCollorFill(1820) RGA_COLORFILL fail: Invalid argument
374 im2d_rga_impl rga_task_submit(2171): Failed to call RockChipRga interface, please use 'dmesg' command to view driver error log.
374 im2d_rga_impl rga_dump_channel_info(1500): src_channel:
rect[x,y,w,h] = [0, 0, 0, 0]
image[w,h,ws,hs,f] = [0, 0, 0, 0, rgba8888]
buffer[handle,fd,va,pa] = [0, 0, 0, 0]
color_space = 0x0, global_alpha = 0x0, rd_mode = 0x0
374 im2d_rga_impl rga_dump_channel_info(1500): dst_channel:
rect[x,y,w,h] = [0, 0, 640, 640]
image[w,h,ws,hs,f] = [640, 640, 640, 640, rgb888]
buffer[handle,fd,va,pa] = [10, 0, 0, 0]
color_space = 0x0, global_alpha = 0xff, rd_mode = 0x1
374 im2d_rga_impl rga_dump_opt(1550): opt version[0x0]:
374 im2d_rga_impl rga_dump_opt(1551): set_core[0x0], priority[0]
374 im2d_rga_impl rga_dump_opt(1554): color[0x72727272]
374 im2d_rga_impl rga_dump_opt(1563):
374 im2d_rga_impl rga_task_submit(2180): acquir_fence[-1], release_fence_ptr[0x0], usage[0x280000]
rknn_run
-- matmul_by_cpu_uint8 use: 43.733002 ms
-- resize_by_opencv_uint8 use: 7.195000 ms
-- crop_mask_uint8 use: 2.451000 ms
-- seg_reverse use: 1.316000 ms
stem @ (239 50 251 90) 0.869
stem @ (408 90 414 123) 0.814
fruits @ (373 124 464 335) 0.808
fruits @ (219 90 279 231) 0.808
write_image path: out.png width=640 height=360 channel=3 data=0x7f890f7010
2.排查过程
- 根据报错信息提示,用
dmesg
指令查看,另一个终端重新运行一次出错的问题:
dmesg -w
[ 4626.900161] rga_policy: invalid function policy
[ 4626.900171] rga_job: job assign failed
[ 4626.900173] rga_job: failed to get scheduler, rga_job_commit(407)
[ 4626.900178] rga_job: request[13] task[0] job_commit failed.
[ 4626.900181] rga_job: rga request commit failed!
[ 4626.900183] rga: request[13] submit failed!
- 根据解读,好像是申请的内存太大了,官方说参考:https://gitee.com/hihope-rk3588/rk3588-librga/blob/master/samples/allocator_demo/src/rga_allocator_dma32_demo.cpp
- 查看从瑞芯微官网拉下来的源码
rknn_model_zoo/examples/yolov8_seg/cpp
,可以看到确实是申请的4G,那应该没问题。
3.对比测试验证
- 经过上面排查之后发现各方面都没问题,我开始怀疑是不是官方给的代码有问题呢?为了做对比验证,我下载了YOLOv8官方提供的模型文件
yolov8x-seg.pt
,将其转换成.onnx
和.rknn
格式,根据官方提供的测试图像bus.jpg
,运行后发现并无问题。
- 这就证明官方的源代码没有问题,于是我用官方的模型推理了一张自己的图像(原先bus.jpg是640x640分辨率的,自己的测试图像是640x360分辨率的),发现也会有Failed to call RockChipRga interface的报错。
- 对比发现难道是因为测试图像的分辨率大小不一样导致的吗?于是我将自己的测试图像分辨率改成了640x640,用黑色进行填充,再次测试发现没问题了。
- 再用自己训练的模型对修改分辨率后的图像进行推理,也没问题了。
结论
猜测是因为后处理部分的代码未对输入图像的尺寸进行处理,后面这部分代码需要修改。
我的对比结果
PS:bus.jpg是官方提供的640x640图像,ExperimentData0003.png是我测试的640x360图像,output_image.jpg是将我的测试图像填充为640x640的
使用瑞芯微官方的交叉编译代码:
(√) ./rknn_yolov8_seg_demo model/yolov8x-seg.rknn model/bus.jpg
(可以正常推理,报错Failed to call RockChipRga interface) ./rknn_yolov8_seg_demo model/yolov8x-seg.rknn model/ExperimentData0003.png
(√) ./rknn_yolov8_seg_demo model/yolov8x-seg.rknn model/output_image.jpg
(√) ./rknn_yolov8_seg_demo model/yolov8x-seg.rknn model/output_image.png
(X,Segmentation fault) ./rknn_yolov8_seg_demo model/tomato_seg.rknn model/bus.jpg
(可以正常推理,报错Failed to call RockChipRga interface) ./rknn_yolov8_seg_demo model/tomato_seg.rknn model/ExperimentData0003.png
(X,Segmentation fault) ./rknn_yolov8_seg_demo model/tomato_seg.rknn model/output_image.jpg
(√) ./rknn_yolov8_seg_demo model/tomato_seg.rknn model/output_image.png
1.和输入图像的尺寸有关系,尽量保证图像长宽一致,如640x640,否则RGA会报错
2.自己训练的模型,推理用png格式,如果用jpg会报错 Segmentation fault
https://github.com/airockchip/librga/blob/main/docs/Rockchip_FAQ_RGA_CN.md