Chatglm2-6b does LORA fine-tuning on P40 | JD Cloud technical team

background:

At present, the technical application of large models has blossomed everywhere. The fastest way to apply is nothing more than fine-tuning the model using data from its own vertical domain. chatglm2-6b has a more prominent effect on the domestic open source large model. The content shared in this article is to use the chatglm2-6b model to fine-tune LORA in the vertical field on the P40 machine of Group EA.

1. Introduction of chatglm2-6b

github: https://github.com/THUDM/ChatGLM2-6B

Chatglm2-6b has several improvements compared to chatglm:

1. Performance improvement: Compared with the original model, the base model of ChatGLM2-6B has been upgraded, and at the same time, it has achieved good results in various data set evaluations;

2. Longer context: We extended the context length (Context Length) of the base model from 2K of ChatGLM-6B to 32K, and used 8K context length training in the dialogue stage;

3. More efficient reasoning: Based on Multi-Query Attention technology, ChatGLM2-6B has more efficient reasoning speed and lower video memory usage: under the official model implementation, the reasoning speed has increased by 42% compared with the first generation;

4.  More open protocol : ChatGLM2-6B weights are completely open to academic research, and free commercial use is also allowed after filling out the questionnaire for registration.

2. Introduction of fine-tuning environment

2.1 Performance requirements

Reasoning this, chatglm2-6b only needs 14G of video memory on fp16, so P40 can be covered.

The configuration of the P40 graphics card on EA is as follows:

2.2 Mirror environment

Before doing fine-tuning, you need to configure the compilation environment. I use the docker image to load the image environment. The specific configuration is as follows:

FROM base-clone-mamba-py37-cuda11.0-gpu

# mpich
RUN yum install mpich  

# create my own environment
RUN conda create -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/ --override --yes --name py39 python=3.9
# display my own environment in Launcher
RUN source activate py39 \
    && conda install --yes --quiet ipykernel \
    && python -m ipykernel install --name py39 --display-name "py39"

# install your own requirement package
RUN source activate py39 \
    && conda install -y -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/ \
    pytorch  torchvision torchaudio faiss-gpu \
    && pip install --no-cache-dir  --ignore-installed -i https://pypi.tuna.tsinghua.edu.cn/simple \
    protobuf \
    streamlit \
    transformers==4.29.1 \
    cpm_kernels \
    mdtex2html \
    gradio==3.28.3 \
	sentencepiece \
	accelerate \
	langchain \
    pymupdf \
	unstructured[local-inference] \
	layoutparser[layoutmodels,tesseract] \
	nltk~=3.8.1 \
	sentence-transformers \
	beautifulsoup4 \
	icetk \
	fastapi~=0.95.0 \
	uvicorn~=0.21.1 \
	pypinyin~=0.48.0 \
    click~=8.1.3 \
    tabulate \
    feedparser \
    azure-core \
    openai \
    pydantic~=1.10.7 \
    starlette~=0.26.1 \
    numpy~=1.23.5 \
    tqdm~=4.65.0 \
    requests~=2.28.2 \
    rouge_chinese \
    jieba \
    datasets \
    deepspeed \
	pdf2image \
	urllib3==1.26.15 \
    tenacity~=8.2.2 \
    autopep8 \
    paddleocr \
    mpi4py \
    tiktoken

If you need to use the deepspeed method to train, the mpich information transfer toolkit is missing on the EA, and you need to install it manually.

2.3 Model download

huggingface address: https://huggingface.co/THUDM/chatglm2-6b/tree/main

3. LORA fine-tuning

3.1 Introduction to LORA

paper: https://arxiv.org/pdf/2106.09685.pdf

LORA (Low-Rank Adaptation of Large Language Models) fine-tuning method: Freeze the pre-trained model weight parameters. In the case of freezing the original model parameters, add additional network layers to the model and train only these newly added networks. Layer parameters.

The idea of ​​LoRA:

  • Add a bypass next to the original PLM (Pre-trained Language Model) to perform a dimension reduction and then dimension increase operation.
  • During the training, the parameters of the PLM are fixed, and only the dimensionality reduction matrix A and the dimensionality enhancement moment B are trained. The input and output dimensions of the model remain unchanged, and the parameters of BA and PLM are superimposed when outputting.
  • Initialize A with a random Gaussian distribution, initialize B with a 0 matrix, and ensure that the bypass matrix is ​​still a 0 matrix at the beginning of training.

3.2 Fine-tuning

The peft tool provided by huggingface can easily fine-tune the PLM model, and here is also the peft tool used to create LORA.

peft's github: https://gitcode.net/mirrors/huggingface/peft?utm_source=csdn_github_accelerator

Load model and lora fine-tuning:

    # load model
    tokenizer = AutoTokenizer.from_pretrained(args.model_dir, trust_remote_code=True)
    model = AutoModel.from_pretrained(args.model_dir, trust_remote_code=True)
    
    print("tokenizer:", tokenizer)
    
    # get LoRA model
    config = LoraConfig(
        r=args.lora_r,
        lora_alpha=32,
        lora_dropout=0.1,
        bias="none",)
    
    # 加载lora模型
    model = get_peft_model(model, config)
    # 半精度方式
    model = model.half().to(device)

It should be noted here that to load the local model with huggingface, you need to create a work file. There is no permission on EA and it is not created in .cache. Here, you need to formulate the work path first.

import os
os.environ['TRANSFORMERS_CACHE'] = os.path.dirname(os.path.abspath(__file__))+"/work/"
os.environ['HF_MODULES_CACHE'] = os.path.dirname(os.path.abspath(__file__))+"/work/"


If you need to train with deepspeed, select the zero-stage method you need:

    conf = {"train_micro_batch_size_per_gpu": args.train_batch_size,
            "gradient_accumulation_steps": args.gradient_accumulation_steps,
            "optimizer": {
                "type": "Adam",
                "params": {
                    "lr": 1e-5,
                    "betas": [
                        0.9,
                        0.95
                    ],
                    "eps": 1e-8,
                    "weight_decay": 5e-4
                }
            },
            "fp16": {
                "enabled": True
            },
            "zero_optimization": {
                "stage": 1,
                "offload_optimizer": {
                    "device": "cpu",
                    "pin_memory": True
                },
                "allgather_partitions": True,
                "allgather_bucket_size": 2e8,
                "overlap_comm": True,
                "reduce_scatter": True,
                "reduce_bucket_size": 2e8,
                "contiguous_gradients": True
            },
            "steps_per_print": args.log_steps
            }

Others are data processing and processing. What needs to be paid attention to is how to build prompts. I personally think that it is very important to do fine-tuning in the field to build prompts, and ultimately have a greater impact on the model.

4. Fine-tuning results

At present, the model is still in finetune, batch=1, epoch=3, and it has been iterated for one round.

Author: JD Retail Zheng Shaoqiang

Source: Reprinted by JD Cloud developer community, please indicate the source

The third-year junior high school student wrote the web version of Windows 12 deepin -IDE officially debuted, known as "truly independent research and development " . Simultaneously updated", the underlying NT architecture is based on Electron "Father of Hongmeng" Wang Chenglu: The Hongmeng PC version system will be launched next year, and Wenxin will be open to the whole society . Officially released 3.2.0 Green Language V1.0 Officially released
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4090830/blog/10108135