Bloom&LLAMA of large models----SFT (model fine-tuning)

0. Introduction

With the explosion of chatgpt, many large models have appeared recently, such as the Bloom series and the LLAMA-based ziya and baichuan. These models are more promising than chatglm because they are fully commercially available and can be updated iteratively. Recently, the author is studying with hiyouga's LLaMA-Efficient-Tuning. Compared with other projects, this project is very suitable for learning and getting started.

1. What is SFT

SFT (Scalable Fine-Tuning) is a technology for natural language processing that fine-tunes a pre-trained language model to adapt it to specific tasks. In large model SFT, large pre-trained language models are used, such as LLAMA, GPT, etc. These models have billions or even tens of billions of parameters and can process large amounts of text data.

The main idea of ​​SFT is to fine-tune the model for a specific task based on a large pre-trained model. In the fine-tuning process, the model will adjust the parameters and structure of the model according to the characteristics of the task to improve the performance of the model on the task. During fine-tuning, different techniques can be used such as data augmentation, regularization, optimization algorithms, etc.

The advantage of SFT is that it can be quickly fine-tuned for different tasks without retraining the entire model. In addition, due to the use of large pre-trained models, massive text data can be used for training, resulting in better performance. However, SFT also has some disadvantages, such as requiring a lot of computing resources and time for fine-tuning, and problems such as overfitting may occur.

Currently commonly used SFT methods include P-Tuning v2 , LORA , QLoRA , Freeze, full-parameter and other methods. Let's take a look at how to write SFT in LLaMA-Efficient-Tuning


2. Code reading – train_sft.py

The following is the script of sft corresponding to the large model, which mainly includes the steps of model and data preparation, data set division, training and evaluation.

First, the code imports some necessary modules and functions. This includes some utility functions for data processing, training, loading pretrained models, and plotting loss plots. (This part is the same as in pt)

    # Prepare pretrained model and dataset
    model_args, data_args, training_args, finetuning_args = prepare_args(stage="sft")# 用于准备各种参数,包括模型参数、数据参数、训练参数和微调参数。
    dataset = prepare_data(model_args, data_args)# 用于准备数据集
    model, tokenizer = load_pretrained(model_args, finetuning_args, training_args.do_train, stage="sft")# 用于加载sft微调的模型和分词器。
    dataset = preprocess_data(dataset, tokenizer, data_args, training_args, stage="sft")# 用于预处理数据,例如将文本转换为模型可以理解的格式。
    data_collator = DynamicDataCollatorWithPadding(tokenizer, data_args.ignore_pad_token_for_loss)# 动态地对数据进行填充,使得每个batch中的数据长度一致。

The code below is used to override the decoding parameters of Seq2SeqTrainer

   # Override the decoding parameters of Seq2SeqTrainer
    training_args.generation_max_length = training_args.generation_max_length if \
                training_args.generation_max_length is not None else data_args.max_target_length# 设置训练参数(training_args)中的生成最大长度
    training_args.generation_num_beams = data_args.eval_num_beams if \
                data_args.eval_num_beams is not None else training_args.generation_num_beams # 设置训练参数中的生成束搜索数(generation_num_beams)

Then, the data set is divided according to whether training is performed or not. If training is performed and the proportion of the development set is greater than 0, then the data set is divided into a training set and a development set; otherwise, the entire data is used for training. If no training is performed, all data is used for evaluation or prediction.

    # Split the dataset
    if training_args.do_train:
        if data_args.dev_ratio > 1e-6:
            dataset = dataset.train_test_split(test_size=data_args.dev_ratio)
            trainer_kwargs = {
    
    "train_dataset": dataset["train"], "eval_dataset": dataset["test"]}
        else:
            trainer_kwargs = {
    
    "train_dataset": dataset}
    else: # do_eval or do_predict
        trainer_kwargs = {
    
    "eval_dataset": dataset}

Next, initialize the Seq2SeqPeftTrainer object, passing in parameters such as fine-tuning parameters, model, training parameters, tokenizer, data processor, callback function, and calculated metrics (all inherited from Seq2SeqTrainer), as well as the previously divided data set. We will read the operation carefully in the next section.

…For details, please refer to Gu Yueju

Guess you like

Origin blog.csdn.net/lovely_yoshino/article/details/131394309