【开发】过程中常见的问题和错误总结

Python

更多关注
计算机视觉-Paper&Code - 知乎

问题 截图 解决方案 备注
相关的.so文件找不到 1、检查环境变量是否正确。2、使用查找到相应的so文件后 find / -name "{缺失的so文件}" 。添加到环境变量中 export LD_LIBRARY_PATH={DIRECTORY}
libxml2.so.2: cannot open shared object file: No such file or directory apt-get install libxml2 -y && apt-get install openssl openssl-dev -y
libgmpxx.so.4: cannot open shared object file: No such file or directory apt-get install libgmpxx4ldbl
安装h5py报错 error: Unable to load dependency HDF5, make sure HDF5 is installed properly
error: libhdf5.so: cannot open shared object file: No such file or directory
apt-get install libhdf5-dev -y
编译安装mmcv时,出现ERROR: Could not find a version that satisfies the requirement pytest-runner <br> ERROR: No matching distribution found for pytest-runner pip install pytest-runner dockerfile中检查,先安装pytest-runner
Docker容器内pip timeout ERROR: No matching distribution found for numpy 1、使用–net host选项 docker run --net host --name ubuntu -it ubuntu bash
2、使用–dns选项 docker run --dns 8.8.8.8 --dns 8.8.4.4 --name ubuntu -it ubuntu bash
3、改dns server
原因是Docker容器内不能联网,无法使用DNS解析
docker编译引用nvidia镜像源报错 apt-get update 失败 Reading package lists… Done
W: GPG error: https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu1804/x86_64 Release: The following signatures were invalid: BADSIG F60F4B3D7FA2AF80 cudatools [email protected]
E: The repository ‘https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 Release’ is not signed.
N: Updating from such a repository can’t be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
RUN rm /etc/apt/sources.list.d/* 删除掉nVidia的源,英伟达DNS时不时会挂掉
运行docker run NVIDIA镜像时候出现
no such file or directory): exec: “nvidia-container-runtime”: executable file not found in $PATH: : unknown.
apt-get install nvidia-container-runtime
autogluon出现 multiprocessing.context.TimeoutError docker run --shm 4096m 多进程加载数据集DataLoader会占用大量共享内存,docker默认是64m
ssh server在平台无法启动,报错/lib/x86_64-linux-gnu/libc.so.6: version ‘GLIBC_2.25’ not found https://apulis-gitlab.apulis.cn/apulis/apulis-wiki/-/blob/master/algorithm/wiki/docker/How-to-write-a-better-dockerfile.md 使用Ubuntu16.04的镜像会导致在平台中无法启动ssh server,更换为Ubuntu18.04
GPU训练时 Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED config = tf.compat.v1.ConfigProto(gpu_options=tf.compat.v1.GPUOptions(allow_growth=True)) GPU使用时,需要申请内存
horovod无法调用多卡TensorFlow device (GPU:0) is being mapped to multiple CUDA devices 1、如果开头调用过device_lib中的list_gpu_devices则无法调用多卡。 2、如果使用MonitoredTrainingSession,则需要在初始化全局变量之后执行hvd.broadcast_global_variables
使用python opencv时,报错找不到libGL.so.1 [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-qgPm0Xbr-1648046933271)(1.png)] sudo apt-get update && sudo apt-get install libgl1-mesa-glx -y 安装opencv缺少packages
NotImplementedError: Cannot convert a symbolic Tensor (strided_slice_4:0) to a numpy array. numpy版本不支持symbolic tensor
RuntimeError: mindspore/ccsrc/transform/graph_ir/convert.cc:102 FindAdapter] Can’t find OpAdapter for Div 升级mindspore版本 当前导出AIR不支持包含控制流语义的网络,类似在网络的construct中存在 for、while、if的语法。centernet转air报这个错 转mindir不会报错
no module find object_detection 1、添加python sys path,sys.path.append("./object_detection")
2、export PYTHONPATH=./object_detection。 3、将相关依赖文件夹拷贝到/usr/local/lib/python/site-packages中
找不到引用的相关package
libX11.so.6:cannot open shared object file: No such file or directory 1、关闭掉plot绘图。matplotlib.use('agg') 2、安装libx11 apt-get install libx11-dev matplotlib需要使用绘图程序,linux上不能调用图形插件,因此关掉图形显示
numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject pip uninstall numpy && pip install numpy numpy版本不匹配,需要重新安装
在linux上执行脚本时出现$’\r’:command not found 1、apt-get install dos2unix
2、dos2unix 文件名
shell脚本在windows下编辑后上传到linux上执行时,windows下的换行是\r\n,而linux下是换行符\n。linux下不识别\r为回车符,所以报错。因此使用dos2unix命令将脚本文件中的\r去掉即可
NotImplementedError: Cannot convert a symbolic Tensor (2nd_target:0) to a numpy array pip install numpy==1.19.5 numpy版本错误,需要将1.20.2降为1.19.5
git pull的时候出现 fatal: refusing to merge unrelated histories git pull origin master --allow-unrelated-histories 由于远程仓库合并了相关commit,导致本地仓库和远程仓库实际上历史commit对不上。同样也可以重新clone解决,建立好每次修改代码前都pull的好习惯
self._abc_registry = extra._abc_registry
AttributeError: type object ‘Callable’ has no attribute ‘_abc_registry’
pip uninstall typing 之后还不行就pip uninstall dataclasses
pip install autogluon安装autogluon报错 AttributeError: type object ‘Callable’ has no attribute ‘_abc_registry’ pip uninstall typing
import apt_pkg
ModuleNotFoundError: No module named ‘apt_pkg’
apt-get install -y python3-apt python-apt python-dev python3-dev
python3.7 -m pip install --upgrade setuptools pip
安装autogluon、ConfigSpace出现 error: command ‘x86_64-linux-gnu-gcc’ failed with exit status 1
ERROR: Could not build wheels for ConfigSpace which use PEP 517 and cannot be installed directly
sudo apt-get install python3 python-dev python3-dev build-essential libssl-dev libffi-dev libxml2-dev libxslt1-dev zlib1g-dev python-pip
pytorch出现 correct_k = correct[:k].view(-1).float().sum(0).item() RuntimeError: view size is not compatible with input tensor’s size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(…) instead. 对tensor进行操作时先调用contiguous()。如tensor.contiguous().view() 用多卡训练的时候tensor不连续,即tensor分布在不同的内存或显存中
CMake Error at CMakeLists.txt:183 (message):
Protobuf compiler not found
Call Stack (most recent call first):
CMakeLists.txt:202 (RELATIVE_PROTOBUF_GENERATE_CPP)
apt-get install libprotobuf-dev protobuf-compiler -y && export CMAKE_ARGS="-DONNX_USE_PROTOBUF_SHARED_LIBS=ON" 安装onnxruntime1.2.1报错
Fitting model: KNeighborsUnif … Training model for up to 3599.93s of the 3599.93s of remaining time.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
fit添加hyperparameters ={“KNN”:,“n_jobs”:16} autogluon automl报错
fater error: Python.h:No such file or directory sudo apt-get install python3.7-dev
安装pycuda报错 pip install pycuda --global-option="-I/usr/local/cuda-10.0/targets/aarch64-linux/include/" --global-option="-L/usr/local/cuda-10.0/targets/aarch64-linux/lib/"
fatal: unable to access ‘http://apulis-gitlab.apulis.cn/apulis/model-gallery/’: Problem with the SSL CA cert (path? access rights?) sudo apt install -y ca-certificates 镜像证书过期
Error: No such container:path: pytorch_backend_ptlib:/opt/conda/lib/libomp.so apt-get install libomp5 libomp-dev -y
cp /usr/lib/x86_64-linux-gnu/libomp.so .

猜你喜欢

转载自blog.csdn.net/weixin_43953700/article/details/123698852