tensorflow serving服务器安装过程总结

1.安装环境

centos
cuda 9
cudnn 7
tensorflow serving r1.12 以及tensorflow1.12通过编译.

2.非gpu版本tf_serving安装

  • tf_serving的安装(非gpu版本)(cuda 9 cudnn 7)
    • 编译tf_serving的过程
      • git clone -b r1.3 --recurse-submodules https://github.com/tensorflow/serving
      • 进入serving/tensorflow目录,运行./configure 配置构建文件
      • bazel build -c opt --config=cuda tensorflow_serving/...
        目前是bazel build -c opt tensorflow_serving/... 通过编译
    • 编译tf_serving中遇到的问题以及解决方法
      • 1. no such package ‘@protobuf//’: java.io.IOException: Error downloading
        解决方法:sed -i '\@https://github.com/google/protobuf/archive/0b059a3d8a8f8aa40dde7bea55edca4ec5dfea66.tar.gz@d' tensorflow/workspace.bzl
      • 2. no such target '@org_tensorflow//third_party/gpus/crosstool:crosstool’
        解决方法:
        • 编辑 tools/bazel.rc 文件,把@org_tensorflow//third_party/gpus/crosstool
          修改成@local_config_cuda//crosstool:toolchain
        • 执行:bazel clean --expunge && export TF_NEED_CUDA=1
        • 执行:bazel query ‘kind(rule, @local_config_cuda//…)’
      • 3. fatal error: stropts.h: No such file or directory
        解决方法:
        • vim tensorflow/third_party/curl.BUILD
        • remove the line like:
        • define HAVE_STROPTS_H 1
      • 4. 其中如果遇到/tmp的权限问题,可以设置export TMPDIR=XXX.

3.gpu版本tf_serving安装

  • tf_serving的安装(gpu版本)(cuda 9 cudnn 7)

    • 编译tf_serving的过程
      • bazel build -c opt --config=cuda tensorflow_serving/…
      • 最终编译命令: export TF_NEED_CUDA=1 && export TMPDIR=/home/work/XXX/tools/serving/tmp && /home/work/XXX/bin/bazel build -c opt --config=cuda tensorflow_serving/...
    • 问题总结
      • 1.ERROR: Building with --config=cuda but TensorFlow is not configured to build with GPU support(已经在configure中设置了,但是还是出现这个问题)
        解决方法:export “TF_NEED_CUDA=1”. 最终命令为 export TF_NEED_CUDA=1 && /home/work/XXX/bin/bazel build -c opt --config=cuda tensorflow_serving/...
      • 2.fatal error: third_party/nccl/nccl.h: No such file or directory
        解决方法: (设置两个参数)首先是安装nccl
        export TF_NCCL_VERSION=‘2.1.15’
        export NCCL_INSTALL_PATH=/usr/local/nccl2 (my prefered path)
        最终命令为:export TF_NEED_CUDA=1 && export TMPDIR=/home/work/XXX/tools/serving/tmp && export TF_NCCL_VERSION=2.3.4 && export NCCL_INSTALL_PATH=/usr/local/nccl_2.3.4 && /home/work/XXX/bin/bazel build -c opt --config=cuda tensorflow_serving/...
      • 3.error adding symbols: DSO missing from command line
        解决方法:修改util/net_http/server/testing/BUILD和tensorflow_serving/util/net_http/client/testing/BUILD文件,修改为cc_binary( … linkopts = ["-lm"], ).即在cc_binary中添加’linkopts = ["-lm"],’,原有的配置不变.
      • 4. error: possibly undefined macro: AC_PROG_LIBTOOL
        解决方法: 安装libtool (centos 使用yum ubuntu使用apt-get)
      • 5.在同一个机器,重新编译tf_serving的时候,会出现nccl位置变量已经设置好并且 “NCCL_HDR_PATH” not found in dictionary
        解决方法:重新设置TF_NCCL_VERSION,NCCL_HDR_PATH等变量.命令如下:
        serving]$ export TF_NEED_CUDA=1 && export TMPDIR=/home/XXX/tmp && export TF_NCCL_VERSION=2.3.4 && export NCCL_INSTALL_PATH=/home/XXX/tools/nccl_2.3.4/lib && export NCCL_HDR_PATH=/home/XXX/tools/nccl_2.3.4/include && /home/XXX/bin/bazel build -c opt --config=cuda tensorflow_serving/...
      
      • 6. undefined reference to symbol 'XXX@@GLIBC_2.2.5)
        解决方法:https://github.com/tensorflow/tensorflow/issues/2291 解决方法是找到相应的文件 grep -rn "LINK_OPTS" */*/*/*, 修改为:"//conditions:default": ["-lpthread","-lrt","-lm"],, 主要是根据前面的提示,找到相应的BUILD文件,然后修改.
        比方说:ERROR: /home/XXX/tools/serving/tensorflow_serving/util/net_http/socket/testing/BUILD那么就修改XX/socket/testing/BUILD文件.
      • 7. error adding symbols: Bad value,错误描述relocation R_X86_64_32 against ‘.rodata’ can not be used when making a shared object; recompile with -fPIC
        解决方法:build时候添加 –copt="-fPIC" 参数.最终命令为
      export TF_NEED_CUDA=1 && export TMPDIR=/home/XXX/tmp && export TF_NCCL_VERSION=2.3.4 && export NCCL_INSTALL_PATH=/home/XXX/tools/nccl_2.3.4/lib && export NCCL_HDR_PATH=/home/XXX/tools/nccl_2.3.4/include && /home/XXX/bin/bazel build -c opt --config=cuda --copt="-fPIC" tensorflow_serving/...
      
  • 启动tf_serving服务

    • bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server --port=9000 --model_name=bert --model_base_path=/XXX/model/bert/save_model
    • 设置tensorflow serving的gpu数,方法为export CUDA_VISIBLE_DEVICES="0"
  • 在运行client端程序时,出现 NDIMS == dims() (2 vs. 4)Asking for tensor of 2 dimensions from a tensor of 4 dimensions

    • 可能是因为tensorflow版本不对,需要调整tensorflow的版本,比如说调整到1.12.

以下是安装serving+tensorRT的脚本

export TF_NEED_CUDA=1 
export TMPDIR=/home/xxx/tools/serving_tensorrt/tmp 
export TF_NCCL_VERSION=2.3.4 
export TF_NEED_TENSORRT=1
export TF_TENSORRT_VERSION=5.0.2           
export TENSORRT_INSTALL_PATH=/home/xxx/tools/TensorRT-5.0.2.6/lib 
#export TENSORRT_BIN_PATH=/home/xxx/tools/TensorRT-5.0.2.6 
export NCCL_INSTALL_PATH=/home/xxx/tools/nccl_2.3.4/lib 
export NCCL_HDR_PATH=/home/xxx/tools/nccl_2.3.4/include 

/home/xxx/bin/bazel build -c opt \
                                   --copt="-fPIC" \
                                   --config=cuda tensorflow_serving/...

发布了98 篇原创文章 · 获赞 337 · 访问量 48万+

猜你喜欢

转载自blog.csdn.net/yiyele/article/details/89882953