关于理论部分, 已有很多篇博客做了详细解读, 这里就不写了. 仅有代码调试.
官方给的代码有坑,
要运行代码,需要编译,官方给的代码能在cuda10与pytorch1.1x上编译, 如果你是cuda11/12, 或者更高的torch版本, 一定会出错
源码:
https://github.com/microsoft/GLIP
我已经填平各种坑, 不想努力的同学可以直接用我修改后的:
https://github.com/yblir/GLIP_detection
用法: 编译成功后, 直接运行根目录下的glip_predict.py
一 编译脚本
编译成功后,如果还有ImportError: cannot import name ‘_C’ from 'maskrcnn_benchmark’错误, 将_C.cp38xxx,移动到maskrcnn_bechmark根目录下. 如果没有编译就执行运行代码, 也会有这个错误, 原理一样,都是没有_C.cp38xxx 文件.
经过验证, win10/linux都可以成功编译.
编译中遇到的大多数问题都可通过这篇博客解决, 如果不想努力, 可以使用我提供的代码, 填平了所有坑, 但未修改任何主体代码:
https://blog.csdn.net/code_zhao/article/details/129172817
需要注意该博文中这段内容:
按我给的红字提示修改: dim3 grid(std::min::ceil_dic(int(**),512),4096));
二 代码调试
2.1 报错, pytorch高版本造成, 但在当前代码无用,可注释掉
AttributeError: module 'torch' has no attribute '_six'
maskrcnn_benchmark/utils/imports.py
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
import torch
# if torch._six.PY37:
# import importlib
# import importlib.util
# import sys
#
#
# # from https://stackoverflow.com/questions/67631/how-to-import-a-module-given-the-full-path?utm_med
# ium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa
# def import_file(module_name, file_path, make_importable=False):
# spec = importlib.util.spec_from_file_location(module_name, file_path)
# module = importlib.util.module_from_spec(spec)
# spec.loader.exec_module(module)
# if make_importable:
# sys.modules[module_name] = module
# return module
# else:
import imp
def import_file(module_name, file_path, make_importable=None):
module = imp.load_source(module_name, file_path)
return module
2.2 修改函数命名
ImportError: cannot import name '_download_url_to_file' from 'torch.utils.model_zoo'
# 把第6行的_download_url_to_file 的前下划线去掉
#from torch.hub import _download_url_to_file
from torch.hub import download_url_to_file
2.3 模型手动下载
OSError: Can't load config for 'bert-base-uncased'.
If you were trying to load it from 'https://huggingface.co/models', make sure you don't have
a local directory with the same name. Otherwise, make sure 'bert-base-uncased' is the correct
path to a directory containing a config.json file
现在是从本地加载bert模型, 在项目根目录下建立bert_base_uncased(必须是这个名字)文件夹. 因为代码通过from_pretrained(‘bert_base_uncased’)从网上下载模型, 建立本地同名文件夹相当于覆写了下载路径. 否则要改很多配置文件才能达到同样效果.
2.4 nltk_data: 600M+的包, 有时联网也下载不到, 手动下载后修改加载路径
xml.etree.ElementTree.ParseError: unclosed token: line 472, column 6
从这个地方下载
https://github.com/nltk/nltk_data/tree/gh-pages
放在自己喜欢的位置, 记得把tokenizers中punkt手动解压缩, 然后把添加搜索路径, 再把下面两行download注释掉
maskrcnn_benchmark/engine/predictor_glip.py
这里nltk_data存放的下载后的packages中文件
2.5 numpy版本问题, 高版本不再支持np.float, 在代码中全部改为np.float32
AttributeError: module 'numpy' has no attribute 'float'.
`np.float` was a deprecated alias for the builtin `float`.
2.6 正常警告, 不必管它
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.dense.weight', 'bert.pooler.dense.weight', 'cls.predictions.decoder.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'bert.pooler.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
使用以下日志等级可屏蔽掉
from transformers import logging
logging.set_verbosity_error()
三 预测代码
import warnings
warnings.filterwarnings("ignore")
from transformers import logging
logging.set_verbosity_error()
# pylab.rcParams['figure.figsize'] = 20, 12
from maskrcnn_benchmark.config import cfg
from maskrcnn_benchmark.engine.predictor_glip import GLIPDemo
import cv2
import numpy as np
import torch
from PIL import Image, ImageDraw, ImageFont
class Colors:
# Ultralytics color palette https://ultralytics.com/
def __init__(self):
hexs = (
"FF3838", "FF9D97", "FF701F", "FFB21D", "CFD231", "48F90A", "92CC17", "3DDB86", "1A9334", "00D4BB",
"2C99A8", "00C2FF", "344593", "6473FF", "0018EC", "8438FF", "520085", "CB38FF", "FF95C8", "FF37C7",
)
self.palette = [self.hex2rgb(f"#{
c}") for c in hexs]
self.n = len(self.palette)
def __call__(self, i, bgr=False):
"""Returns color from palette by index `i`, in BGR format if `bgr=True`, else RGB; `i` is an integer index."""
c = self.palette[int(i) % self.n]
return (c[2], c[1], c[0]) if bgr else c
@staticmethod
def hex2rgb(h):
"""Converts hexadecimal color `h` to an RGB tuple (PIL-compatible) with order (R, G, B)."""
return tuple(int(h[1 + i: 1 + i + 2], 16) for i in (0, 2, 4))
def draw_images(image, boxes, classes, scores, colors, xyxy=True):
if isinstance(image, np.ndarray):
image = Image.fromarray(image[:, :, ::-1])
if isinstance(boxes, torch.Tensor):
boxes = boxes.cpu().numpy()
# 设置字体,pillow 绘图环节
font = ImageFont.truetype(font='configs/simhei.ttf',size=np.floor(3e-2 * image.size[1] + 0.5).astype('int32'))
# 多次画框的次数,根据图片尺寸不同,把框画粗
thickness = max((image.size[0] + image.size[1]) // 300, 1)
draw = ImageDraw.Draw(image)
for i, box in enumerate(boxes):
x1, y1, x2, y2 = box
color = colors[i]
label = '{}:{:.2f}'.format(classes[i], scores[i])
tx1, ty1, tx2, ty2 = font.getbbox(label)
tw, th = tx2 - tx1, ty2 - tx1
text_origin = np.array([x1, y1 - th]) if y1 - th >= 0 else np.array([x1, y1 + 1])
# 在目标框周围偏移几个像素多画几次, 让边框变粗
for j in range(thickness):
draw.rectangle((x1 + j, y1 + j, x2 - j, y2 - j), outline=color)
# 画标签
draw.rectangle((text_origin[0], text_origin[1], text_origin[0] + tw, text_origin[1] + th), fill=color)
draw.text(text_origin, label, fill=(0, 0, 0), font=font)
return image
config_file = "configs/pretrain/glip_Swin_T_O365_GoldG.yaml"
weight_file = r'E:\PyCharm\PreTrainModel\glip_tiny_model_o365_goldg_cc_sbu.pth'
cfg.local_rank = 0
cfg.num_gpus = 1
cfg.merge_from_file(config_file)
cfg.merge_from_list(["MODEL.WEIGHT", weight_file])
cfg.merge_from_list(["MODEL.DEVICE", "cuda"])
glip_demo = GLIPDemo(
cfg,
min_image_size=800,
confidence_threshold=0.7,
show_mask_heatmaps=False
)
def glip_inference(image_, caption_):
# 为不同类别设置颜色, 从caption提取的类别不同
colors_ = Colors()
preds = glip_demo.compute_prediction(image_, caption_)
top_preds = glip_demo._post_process(preds, threshold=0.5)
# 从预测结果中提取预测类别,得分和检测框
labels = top_preds.get_field("labels").tolist()
scores = top_preds.get_field("scores").tolist()
boxes = top_preds.bbox.detach().cpu().numpy()
# 为每个预测类别设置框颜色
colors = [colors_(idx) for idx in labels]
# 获得标签数字对应的类别名
labels_names = glip_demo.get_label_names(labels)
return boxes, scores, labels_names, colors
if __name__ == '__main__':
# caption = 'bobble heads on top of the shelf'
# caption = "Striped bed, white sofa, TV, carpet, person"
# caption = "table on carpet"
caption = "Table, TV"
image = cv2.imread('docs/demo.jpg')
boxes, scores, labels_names, colors = glip_inference(image, caption)
print(labels_names, scores)
print(boxes)
image = draw_images(image=image, boxes=boxes, classes=labels_names, scores=scores, colors=colors)
image.show()
四 效果分析
根据不同提示词会获得很多神奇的结果
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
五 总结
如果图中没有提示词物品, 可能会强行安利一个
不能处理太长的句子, 会超出理解范围
提示词很重要, 检测不到的物品, 可能换个表达就能检测出来了.
作为开集检测模型, GLIP还是非常优秀的, 比如零样本检测. 还可以实现视频检测任务. 快速自动标注等,至于检测精度, 还不能与有监督算法相比.
对于工业部署, zero-shot还是算了, 薛定谔的检测结果不太靠谱. 微调后应该会好很多.
公司曾部署过基于clip的视觉大模型, base版本运行性能就很吃力了. glip要实现落地也会面临同样的问题.