Paddle AI Studio can play multi-modal? MiniGPT4 practical exercise!

MiniGPT4 is an improved version based on GPT3. Its parameters are an order of magnitude less than GPT3, but its performance on multiple natural language processing tasks is not inferior to GPT3. The project author uses MiniGPT4-7B as a practical exercise project.

Creator: Yan Zhe

Experience link:
https://aistudio.baidu.com/aistudio/projectdetail/6556667

One key fork

Fork the project and run it. It is recommended to select at least A100 (40G) and above configuration for the operating environment

picture

Install related modules

1import os 
2os.system("pip install --pre --upgrade paddlenlp -f https://www.paddlepaddle.org.cn/whl/paddlenlp.html") # 安装nlp分支最新包
3os.system("pip install paddlepaddle-gpu==0.0.0.post112 -f https://www.paddlepaddle.org.cn/whl/linux/gpu/develop.html")
4os.system("pip install tqdm")
5!pip install ipywidgets

Reference related modules

 1%%capture
 2os.environ["CUDA_VISIBLE_DEVICES"] = "0"
 3os.environ["FLAGS_use_cuda_managed_memory"] = "true"
 4import requests
 5from PIL import Image
 6import gradio as gr
 7from tqdm import tqdm
 8import ipywidgets as widgets
 9from IPython.display import display
10import csv    
11from itertools import islice 
12from paddlenlp.transformers import MiniGPT4ForConditionalGeneration, MiniGPT4Processor

Download miniGPT4 weights or configuration files

1!mkdir minigpt4
 1%%capture
 2os.system("wget -O  minigpt4/model_config.json https://bj.bcebos.com/v1/ai-studio-online/924ed883c17b4b8b88b4a1f98e24d34b3b00160ac9bd4b3ba478aff6974e0e9d?responseContentDisposition=attachment%3B%20filename%3Dmodel_config.json ")
 3!wget -O  ./minigpt4/model_state.pdparams    https://bj.bcebos.com/v1/ai-studio-online/18bd53eaa2854263ba31fb4d75f31a5f0d38421a6da64525bff6da230389fc36?responseContentDisposition=attachment%3B%20filename%3Dmodel_state.pdparams
 4!wget -O  ./minigpt4/generation_config.json  https://bj.bcebos.com/v1/ai-studio-online/f0b2129d6a934a97abcaa139ac1f28e33a6940004c7a4c859737f282640cf332?responseContentDisposition=attachment%3B%20filename%3Dgeneration_config.json
 5!wget -O  ./minigpt4/preprocessor_config.json https://bj.bcebos.com/v1/ai-studio-online/748c332837d34f389d762f487470b1a7221edd36ccb5484b913bd2d3855ee9f6?responseContentDisposition=attachment%3B%20filename%3Dpreprocessor_config.json
 6!wget -O  ./minigpt4/sentencepiece.bpe.model https://bj.bcebos.com/v1/ai-studio-online/0139a1bfcdf84058b77cea4631837340ea94f5fcc37445929a3414f05d07579b?responseContentDisposition=attachment%3B%20filename%3Dsentencepiece.bpe.model
 7!wget  -O  ./minigpt4/special_tokens_map.json https://bj.bcebos.com/v1/ai-studio-online/90b16a96d4f94200ab417b39dcf3bce4ddef5885625c4d0c8e70b3f659cb6993?responseContentDisposition=attachment%3B%20filename%3Dspecial_tokens_map.json
 8!wget -O  ./minigpt4/tokenizer.json  https://bj.bcebos.com/v1/ai-studio-online/e877a685eb86499cb87e1c4cbf85353856506d12e9a841a292e780aa4a9e188a?responseContentDisposition=attachment%3B%20filename%3Dtokenizer.json
 9!wget  -O  ./minigpt4/tokenizer_config.json  https://bj.bcebos.com/v1/ai-studio-online/f93064db167c4075b1f86d6878cac9303fb8df418f7a42a7900785a6e188cc44?responseContentDisposition=attachment%3B%20filename%3Dtokenizer_config.json
10--2023-07-27 10:54:29--  https://bj.bcebos.com/v1/ai-studio-online/924ed883c17b4b8b88b4a1f98e24d34b3b00160ac9bd4b3ba478aff6974e0e9d?responseContentDisposition=attachment%3B%20filename%3Dmodel_config.json
11Resolving bj.bcebos.com (bj.bcebos.com)... 182.61.200.195, 182.61.200.229, 2409:8c04:1001:1002:0:ff:b001:368a
12Connecting to bj.bcebos.com (bj.bcebos.com)|182.61.200.195|:443... connected.
13HTTP request sent, awaiting response... 200 OK
14Length: 5628 (5.5K) [application/octet-stream]
15Saving to: 'minigpt4/model_config.json'

Instantiate the miniGPT4 model and processor

1model_path ='./minigpt4'
2model = MiniGPT4ForConditionalGeneration.from_pretrained(model_path)
3model.eval()
4processor = MiniGPT4Processor.from_pretrained(model_path)

model reasoning

Input image url+prompt (single picture + single round of dialogue)

There is another form of uploading images locally, please enter the project to view

 1def predict_per_url_prompt(url=None,text=None):
 2    if url==None:
 3        url = "https://paddlenlp.bj.bcebos.com/data/images/mugs.png"
 4    image = Image.open(requests.get(url, stream=True).raw)
 5    if text== None:
 6        text = "describe this image"
 7
 8    prompt = "Give the following image: <Img>ImageContent</Img>. You will be able to see the image once I provide it to you. Please answer my questions.###Human: <Img><ImageHere></Img> <TextHere>###Assistant:"
 9
10    inputs = processor([image], text, prompt)
11
12    generate_kwargs = {
13        "max_length": 300,
14        "num_beams": 1,
15        "top_p": 1.0,
16        "repetition_penalty": 1.0,
17        "length_penalty": 0,
18        "temperature": 1,
19        "decode_strategy": "greedy_search",
20        "eos_token_id": [[835], [2277, 29937]],
21    }
22    outputs = model.generate(**inputs, **generate_kwargs)
23    msg = processor.batch_decode(outputs[0])
24    return msg[0][0:-5]

file_path+prompt after uploading the image locally (multiple pictures + single round of dialogue)

 1def predict_dir_and_one_prompt_out_list(dir_path=None,text=None):
 2    import os 
 3    assert os.path.isdir(dir_path),print('请输入文件夹路径,而不是图像路径')
 4    output = []
 5    for per_image_name in tqdm (os.listdir(dir_path)):
 6        image = Image.open(os.path.join(dir_path,per_image_name))
 7        if text== None:
 8            text = "describe this image"
 9        else:
10            text = text
11
12        prompt = "Give the following image: <Img>ImageContent</Img>. You will be able to see the image once I provide it to you. Please answer my questions.###Human: <Img><ImageHere></Img> <TextHere>###Assistant:"
13
14        inputs = processor([image], text, prompt)
15
16        generate_kwargs = {
17            "max_length": 300,
18            "num_beams": 1,
19            "top_p": 1.0,
20            "repetition_penalty": 1.0,
21            "length_penalty": 0,
22            "temperature": 1,
23            "decode_strategy": "greedy_search",
24            "eos_token_id": [[835], [2277, 29937]],
25        }
26        outputs = model.generate(**inputs, **generate_kwargs)
27        msg = processor.batch_decode(outputs[0])
28        output.append(msg[0][0:-5])
29    return output

Show results

Input: describe this picture, use Chinese

picture

Output: This image shows a female character wearing a red and white costume and holding a golden sword. Her hair is white and her eyes are red. She is standing in a meadow, holding the hilt of her sword. This character looks like a hero, her costume and equipment show her strength and courage

1predict_per_url_prompt(url='https://ai-studio-static-online.cdn.bcebos.com/d283b05404bd44b69b9be868fddb67616296858284bf4ad587e29432de66e930',text="描述这张图片,使用中文")
2'这张图片显示了一个女性角色,穿着红色和白色的服装,手持一根金色的剑。她的头发是白色的,眼睛是红色的。她站在一张草地上,手持剑的柄子。这个角色看起来像是一个英雄,她的服装和装备显示出她的力量和勇气'

For more ways to play, you can fork the project with one click to fine-tune the model.

Click the link below to experience more large model applications immediately.

https://aistudio.baidu.com/aistudio/application/center

Guess you like

Origin blog.csdn.net/PaddlePaddle/article/details/132016508