基于phi-2 大预言模型Lora训练与模型评估

LLM微调

LLM微调指的是对大型语言模型（Large Language Models，简称LLM）进行的再训练过程，目的是使这些模型更好地适应特定的任务或应用场景。是将预训练模型进一步适配到特定任务或领域的过程。微调通常在模型已经在大量数据上进行了预训练之后进行，这一阶段会让模型学习到更具体的知识和技能，从而提高其在特定任务上的表现

以下是LLM微调中涉及的一些关键步骤：

模型选择：选择是从头开始训练模型还是修改现有模型。在许多情况下，适应性调整现有模型是高效的。
数据准备：收集和准备用于微调的数据集，可能包括对数据进行清洗、标注和构建特定的提示模板。
执行微调：将数据集分为训练、验证和测试部分，并在训练数据集上对模型进行微调，以优化其在特定任务上的性能。
模型更新：在微调过程中，模型会根据标记数据进行更新，通过比较模型的猜测与实际答案之间的差异来进行优化。
评估与迭代：定期使用指标和基准进行评估，并在提示工程、微调和评估之间进行迭代，直到达到期望的结果。
模型部署：当模型表现符合预期时，进行部署，并在这个阶段优化计算效率和用户体验。

微调的种类

参数高效微调（PEFT）：这是一种技术，它只更新模型的一小部分参数，以适应特定任务，从而显著降低计算成本。
指令微调：使用示例来训练模型，展示模型应该如何响应查询，这可以提高模型在各种任务上的表现。
全微调（FFT）：更新模型所有权重的过程，与预训练相比，全微调需要更多的内存和计算资源。

优化思路

适配器方法：在预训练模型的顶部添加一个小型神经网络（适配器），然后对这个适配器进行训练。
剪枝方法：通过移除模型中的一些不重要的参数来降低模型的复杂度，并只对剩余的重要参数进行训练。
知识蒸馏：利用一个小型模型（学生模型）来模仿大型预训练模型（教师模型）的行为，学生模型通过学习教师模型的输出来获取任务相关的知识。
多任务学习：训练数据集包含多个任务的输入和输出示例，同时提高模型在所有任务上的性能。
检索增强：结合自然语言生成和信息检索的方法，确保语言模型通过外部最新知识或相关文档提供信息来源。

什么是量化LoRA（QLoRA）

QLoRA代表了LoRA的一种更具内存效率的迭代。QLoRA还通过将LoRA适配器（较小矩阵）的权重量化到较低精度（例如，4-bit而不是8-bit），使LoRA更进一步。这进一步减少了内存占用和存储需求。在QLoRA中，预训练的模型用量化的4位权重加载到GPU存储器中，而在LoRA中使用的是8位。尽管比特精度有所下降，QLoRA仍保持着与LoRA相当的有效性水平。

使用QLoRA在自定义数据集上微调LLM流程

安装所需的库
加载数据集
创建和字节配置
加载经过预训练的模型
Tokenization
对原始模型进行测试
预处理数据集
为QLoRA准备模型
设置用于微调的PEFT
列车PEFT适配器
定性评估模型（人工评估）
量化评估模型（使用ROUGE度量）

安装需要的依赖

from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    AutoTokenizer,
    TrainingArguments,
    Trainer,
    GenerationConfig
)
from tqdm import tqdm
from trl import SFTTrainer
import torch
import time
import pandas as pd
import numpy as np
from huggingface_hub import interpreter_login

#interpreter_login() ## 登录 Huggingface，输入免费 token 即可 选择操作

import os
# disable Weights and Biases
#禁用Weights and Biases（W&B）跟踪
#减少程序的资源消耗时
os.environ['WANDB_DISABLED']="true"

加载数据集

huggingface_dataset_name = "neil-code/dialogsum-test"
dataset = load_dataset(huggingface_dataset_name)
#加载之前下载过的数据
## 查看数据范例
print(dataset['train'][0])

返回结果

{‘id’: ‘train_0’, ‘dialogue’: “#Person1#: Hi, Mr. Smith. I’m Doctor
Hawkins. Why are you here today?\n#Person2#: I found it would be a
good idea to get a check-up.\n#Person1#: Yes, well, you haven’t had
one for 5 years. You should have one every year.\n#Person2#: I know. I
figure as long as there is nothing wrong, why go see the
doctor?\n#Person1#: Well, the best way to avoid serious illnesses is
to find out about them early. So try to come at least once a year for
your own good.\n#Person2#: Ok.\n#Person1#: Let me see here. Your eyes
and ears look fine. Take a deep breath, please. Do you smoke, Mr.
Smith?\n#Person2#: Yes.\n#Person1#: Smoking is the leading cause of
lung cancer and heart disease, you know. You really should
quit.\n#Person2#: I’ve tried hundreds of times, but I just can’t seem
to kick the habit.\n#Person1#: Well, we have classes and some
medications that might help. I’ll give you more information before you
leave.\n#Person2#: Ok, thanks doctor.”, ‘summary’: “Mr. Smith’s
getting a check-up, and Doctor Hawkins advises him to have one every
year. Hawkins’ll give some information about their classes and
medications to help Mr. Smith quit smoking.”, ‘topic’: ‘get a
check-up’}

#查看每个KEY
for key in dataset['train'][0]:
    print(key)

id 、dialogue、 summary、 topic

加载经过预训练的模型

## bnb config
#量化模型
compute_dtype = getattr(torch, "float16")
bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type='nf4',
        bnb_4bit_compute_dtype=compute_dtype,
        bnb_4bit_use_double_quant=False,
    )

model_name='microsoft/phi-2'
device_map = {
    
    "": 0}
original_model = AutoModelForCausalLM.from_pretrained(model_name, 
                                                      device_map=device_map,
                                                      quantization_config=bnb_config,
                                                      trust_remote_code=True,
                                                      use_auth_token=True,
                                                      torch_dtype=torch.float16 #减少模型的消耗可能是出现报错的原因
                                                     )

加载tokener

## tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name,trust_remote_code=True,padding_side="left",add_eos_token=True,add_bos_token=True,use_fast=False)
tokenizer.pad_token = tokenizer.eos_token
# 词汇表中添加了特殊标记，请确保相关的单词嵌入经过微调或训练的

seed = 42
#设置随机数

tokenizer.pad_token：这个标记通常用作填充（padding）标记。在处理文本序列时，为了保持一致性，需要将所有序列填充到相同的长度。这样，模型就可以有效地使用这些序列进行批量处理。pad_token
就是用来填充序列末尾的标记，以确保所有序列的长度相同。

tokenizer.eos_token：这个标记代表序列结束（end of sequence）标记。它用于指示一个文本序列的结束。在某些模型中，特别是那些需要知道序列结束位置的模型，这个标记非常重要。例如，在翻译任务中，模型需要知道输入序列何时结束，以便开始生成翻译后的输出序列。

预处理数据集

def create_prompt_formats(sample):
    """
    格式化示例的各个字段（“指令”、“输出”）
    然后使用两个换行符将它们连接起来
    ：param sample：字典示例
    """
    INTRO_BLURB = "Below is an instruction that describes a task. Write a response that appropriately completes the request."
    #以下是描述任务的说明。编写一个适当完成请求的响应。
    INSTRUCTION_KEY = "### Instruct: Summarize the below conversation."
    #指导：总结以下对话。
    RESPONSE_KEY = "### Output:"
    
    END_KEY = "### End"

    blurb = f"\n{
      
      INTRO_BLURB}"#请求相应
    instruction = f"{
      
      INSTRUCTION_KEY}"#总结
    input_context = f"{
      
      sample['dialogue']}" if sample["dialogue"] else None
    response = f"{
      
      RESPONSE_KEY}\n{
      
      sample['summary']}"
    end = f"{
      
      END_KEY}"

    parts = [part for part in [blurb, instruction, input_context, response, end] if part]

    formatted_prompt = "\n\n".join(parts)
    sample["text"] = formatted_prompt
    print('*'*10)
    print(sample)
    print('*'*10)
    return sample

from functools import partial
#函数可以实现函数参数的固定化，从而简化函数调用和减少重复的代码。

def get_max_length(model):
    #加载模型参数
    conf = model.config
    #最大长度加载
    max_length = None
    for length_setting in ["n_positions", "max_position_embeddings", "seq_length"]:
        #加载量化的结果
        max_length = getattr(model.config, length_setting, None)
        if max_length:
            print(f"Found max lenth: {
      
      max_length}")
            break
    if not max_length:
        max_length = 1024
        print(f"Using default max length: {
      
      max_length}")
    return max_length

def preprocess_batch(batch, tokenizer, max_length):
    """
    Tokeniz
    """
    return tokenizer(
        batch["text"],
        max_length=max_length,
        truncation=True,
    )

# 参考详细说明 https://github.com/databrickslabs/dolly/blob/master/training/trainer.py
def preprocess_dataset(tokenizer: AutoTokenizer, max_length: int,seed, dataset):
    """数据格式 & tokenize 已经准备好用于训练
    :参数 tokenizer (AutoTokenizer): 模型 Tokenizer
    :参数最大擦灰姑娘高度 : 从标记生成器发出的最大标记数
    """

    # Add prompt to each sample
    print("Preprocessing dataset...")
    
    dataset = dataset.map(create_prompt_formats)#, batched=True)
    print(dataset)
    #对数据集的每个批次应用预处理&并删除“指令”、“上下文”、“响应”和“类别”字段
    #函数可以实现函数参数的固定化，从而简化函数调用和减少重复的代码。
    _preprocessing_function = partial(preprocess_batch, max_length=max_length, tokenizer=tokenizer)
    #数据拼接
    dataset = dataset.map(
        _preprocessing_function,
        batched=True,
        remove_columns=['id', 'topic', 'dialogue', 'summary'],
    )

    # 筛选出input_id超过max_length的样本
    dataset = dataset.filter(lambda sample: len(sample["input_ids"]) < max_length)

    # 无序数据集
    dataset = dataset.shuffle(seed=seed)

    return dataset

## 预处理微调数据集
## Pre-process dataset
max_length = get_max_length(original_model)
print(max_length)

train_dataset = preprocess_dataset(tokenizer, max_length,seed, dataset['train'])
eval_dataset = preprocess_dataset(tokenizer, max_length,seed, dataset['validation'])

train_dataset[0]['text']

查看详细内容

“\nBelow is an instruction that describes a task. Write a response
that appropriately completes the request.\n\n### Instruct: Summarize
the below conversation.\n\n#Person1#: How are your French lessons
going?\n#Person2#: Well, I’m no longer taking French
lessons.\n#Person1#: Are you kidding? You told me you made up your
mind to study French well this summer. Didn’t you sign up for the
four-week course?\n#Person2#: I did. But the teacher told me not to
come back any more after only one week and he returned my money for
the remaining three weeks.\n#Person1#: How come? I’ve never heard of a
case like that before. Did you have a quarrel with your
teacher?\n#Person2#: Of course not. At first everything went well and
he was satisfied with me. But he got angry after I broke the class
rules several times.\n#Person1#: It was your fault, I think. You’d
gone too far.\n#Person2#: Perhaps. But I don’t understand why he told
me to stop coming. He was very kind, you know.\n#Person1#: Just forget
it.\n\n### Output:\n#Person2# is no longer taking French lessons
because #Person2# has been kicked out for broking the class rules
several times. #Person1# comforts #Person2#.\n\n### End”

train_dataset[0]['input_ids'][0:10]

[50256, 198, 21106, 318, 281, 12064, 326, 8477, 257, 4876]

train_dataset[0]['attention_mask'][0:10]

[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

加载 PEFT 模块

from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
#https://huggingface.co/docs/peft/quicktour 可查看详细内容
original_model = prepare_model_for_kbit_training(original_model)

PEFT为微调大型预训练模型提供了参数有效的方法。传统的范式是为每个下游任务微调模型的所有参数，但由于当今模型中的参数数量巨大，这变得极其昂贵和不切实际。相反，训练较少数量的提示参数或使用像低秩自适应（LoRA）这样的重参数化方法来减少可训练参数的数量更有效。

每个PEFT方法都由PeftConfig类定义，该类存储用于构建PeftModel的所有重要参数。例如，要使用LoRA进行训练

task_type：要训练的任务（在本例中为序列到序列语言建模）
i> nference_mode：无论您是否使用该模型进行推理
r：低秩矩阵的维数
lora_alpha：低秩矩阵的比例因子
lora_dropout：lora层的丢失概率

#加载 模型参数
config = LoraConfig(
    r=32, #低秩矩阵的维数
    lora_alpha=32,低秩矩阵的比例因子
    #target_modules 要应用适配器的模块的名称。如果指定了此项，则仅替换具有指定名称的模块。传递字符串时，将执行正则表达式匹配。
    target_modules=[
        'q_proj',
        'k_proj',
        'v_proj',
        'dense'
    ],
    bias="none", #LoRA的偏移类型。可以是“none”、“all”或“lora_only”。如果“全部”或“lora_only”，则在训练期间会更新相应的偏差。
    lora_dropout=0.05,  # 每一层的遗忘比例
    task_type="CAUSAL_LM",#因果关系大模型 #"SEQ_2_SEQ_LM"
)

# 1 - 启用梯度检查点以减少微调期间的内存使用
original_model.gradient_checkpointing_enable()

#从模型和配置中返回Peft模型对象。
#加载模型和模型参数
peft_model = get_peft_model(original_model, config)

## 打印参数
def print_number_of_trainable_model_parameters(model):
    trainable_model_params = 0
    all_model_params = 0
    for _, param in model.named_parameters():
        all_model_params += param.numel()
        if param.requires_grad:
            trainable_model_params += param.numel()
    return f"\n训练模型参数总量: {
      
      trainable_model_params}\n所有模型参数的总量: {
      
      all_model_params}"

print(print_number_of_trainable_model_parameters(peft_model))

训练模型参数总量: 20971520
所有模型参数的总量: 1542364160

加载模型参数

output_dir = f'./output_model/peft-dialogue-summary-training-{
      
      str(int(time.time()))}'
#保存路径
import transformers

"""
首先在TrainingArguments中定义训练超参数。
可以根据自己的喜好更改大多数参数的值。
"""
#https://huggingface.co/docs/transformers/v4.40.0/en/main_classes/trainer#transformers.TrainingArguments
peft_training_args = TrainingArguments(
    output_dir = output_dir,# 模型输出地址
    warmup_steps=1,#用于从0到learning_rate的线性预热的步骤数。
    per_device_train_batch_size=1,#用于训练的每个GPU/XPU/TPU/MPS/NPU核心/CPU的批量大小。
    gradient_accumulation_steps=4,#在执行向后/更新过程之前，用于累积梯度的更新步骤数。
    #使用梯度累加时，一步算一步，向后通过。因此，每个gradient_accumulation_steps*xxx_step训练示例都会进行记录、评估和保存。
    max_steps=400,#最大步长
    learning_rate=2e-4,#学习率
    optim="paged_adamw_8bit",#优化器 paged_adamw_8bit paged:分页器 ;ADAMV: adamw 在adam基础上添加了权重衰减（weight decay），这是一种正则化技术，有助于防止过拟合。
    logging_steps=50,#（int或float，可选，默认为500）--如果logging_rategy=“steps”，则两个日志之间的更新步骤数。应为[0,1）范围内的整数或浮点值。如果小于1，将被解释为总训练步骤的比率。
    logging_dir="./logs",#日志保存地址 
    save_strategy="steps",#在训练过程中采用的检查点保存策略。“no”：训练期间不进行任何保存。“epoch”：保存在每个epoch结束时完成。“steps”：每次Save_step都进行保存。
    save_steps=50,#多少多少步保存
    evaluation_strategy="steps",#同上 测试过程中检查点保存测量
    eval_steps=50,#测试数据间距
    do_eval=True,#是否eval
    gradient_checkpointing=True,#如果为True，则使用线性检查点以节省内存为代价，降低向后传递的速度。
    report_to="none",#要向其报告结果和日志的集成列表。使用“all”报告所有已安装的集成，使用“none”报告无集成。
    overwrite_output_dir = 'True',#如果为True，则覆盖输出目录的内容。
    group_by_length=True,#是否将训练数据集中大致相同长度的样本分组在一起（
)


peft_model.config.use_cache = False

配置训练器

peft_trainer = transformers.Trainer(
    model=peft_model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    args=peft_training_args,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)

开始训练

peft_trainer.train()

训练结果：
在这里插入图片描述
可以看到从350之后验证集合上的loss 下降已经缓慢顾采用350迭代次数的模型作为后续测试。一旦模型训练成功，我们就可以使用它进行推理。

推理验证人工

加载依赖

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    AutoTokenizer,
    TrainingArguments,
    Trainer,
    GenerationConfig
)

加载模型参数

model_name='microsoft/phi-2'
device_map = {
    
    "": 0}
original_model = AutoModelForCausalLM.from_pretrained(model_name, 
                                                      device_map=device_map,
                                                      quantization_config=bnb_config,
                                                      trust_remote_code=True,
                                                      use_auth_token=True,
                                                      torch_dtype=torch.float16 #减少模型的消耗可能是出现报错的原因
                                                     )

compute_dtype = getattr(torch, "float16")
bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type='nf4',
        bnb_4bit_compute_dtype=compute_dtype,
        bnb_4bit_use_double_quant=False,
    )

base_model_id = "microsoft/phi-2"
base_model = AutoModelForCausalLM.from_pretrained(base_model_id, 
                                                      device_map='auto',
                                                      quantization_config=bnb_config,
                                                      trust_remote_code=True,
                                                      use_auth_token=True)

加载token

eval_tokenizer = AutoTokenizer.from_pretrained(base_model_id, add_bos_token=True, trust_remote_code=True, use_fast=False)
eval_tokenizer.pad_token = eval_tokenizer.eos_token

加载pef

from peft import PeftModel

ft_model = PeftModel.from_pretrained(base_model, "./output_model/peft-dialogue-summary-training-1713691307/checkpoint-350",torch_dtype=torch.float16,is_trainable=False)

验证数据加载

from datasets import load_dataset

huggingface_dataset_name = "neil-code/dialogsum-test"
dataset = load_dataset(huggingface_dataset_name)
#加载之前下载过的数据
## 查看数据范例#print(dataset['train'][0])

seed = 42
#设置随机数
%%time
from transformers import set_seed
set_seed(seed)

index = 5
dialogue = dataset['test'][index]['dialogue']
summary = dataset['test'][index]['summary']

prompt = f"Instruct: Summarize the following conversation.\n{
      
      dialogue}\nOutput:\n"
print(prompt)

“Instruct: Summarize the following conversation.\n#Person1#: You’re finally here! What took so long?\n#Person2#: I got stuck in traffic again. There was a terrible traffic jam near the Carrefour intersection.\n#Person1#: It’s always rather congested down there during rush hour. Maybe you should try to find a different route to get home.\n#Person2#: I don’t think it can be avoided, to be honest.\n#Person1#: perhaps it would be better if you started taking public transport system to work.\n#Person2#: I think it’s something that I’ll have to consider. The public transport system is pretty good.\n#Person1#: It would be better for the environment, too.\n#Person2#: I know. I feel bad about how much my car is adding to the pollution problem in this city.\n#Person1#: Taking the subway would be a lot less stressful than driving as well.\n#Person2#: The only problem is that I’m going to really miss having the freedom that you have with a car.\n#Person1#: Well, when it’s nicer outside, you can start biking to work. That will give you just as much freedom as your car usually provides.\n#Person2#: That’s true. I could certainly use the exercise!\n#Person1#: So, are you going to quit driving to work then?\n#Person2#: Yes, it’s not good for me or for the environment.\nOutput:\n”

查看验证效果加载pompt

def gen(model,p, maxlen=100, sample=True):
    toks = eval_tokenizer(p, return_tensors="pt")
    res = model.generate(**toks.to("cuda"), max_new_tokens=maxlen, do_sample=sample,num_return_sequences=1,temperature=0.1,num_beams=1,top_p=0.95,).to('cpu')
    return eval_tokenizer.batch_decode(res,skip_special_tokens=True)

peft_model_res = gen(ft_model,prompt,100,)

peft_model_output = peft_model_res[0].split('Output:\n')[1]
#print(peft_model_output)
prefix, success, result = peft_model_output.partition('###')

dash_line = '-'.join('' for x in range(100))

print(dash_line)
print(f'INPUT PROMPT:\n{
      
      prompt}')
print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{
      
      summary}\n')
print(dash_line)
print(f'PEFT MODEL:\n{
      
      prefix}')

--------------------------------------------------------------------------------------------------- INPUT PROMPT: Instruct: Summarize the following conversation.
#Person1#: You’re finally here! What took so long?
#Person2#: I got stuck in traffic again. There was a terrible traffic jam near the Carrefour intersection.
#Person1#: It’s always rather congested down there during rush hour. Maybe you should try to find a different route to get home.
#Person2#: I don’t think it can be avoided, to be honest.
#Person1#: perhaps it would be better if you started taking public transport system to work.
#Person2#: I think it’s something that I’ll have to consider. The public transport system is pretty good.
#Person1#: It would be better for the environment, too.
#Person2#: I know. I feel bad about how much my car is adding to the pollution problem in this city.
#Person1#: Taking the subway would be a lot less stressful than driving as well.
#Person2#: The only problem is that I’m going to really miss having the freedom that you have with a car.
#Person1#: Well, when it’s nicer outside, you can start biking to work. That will give you just as much freedom as your car usually
provides.
#Person2#: That’s true. I could certainly use the exercise!
#Person1#: So, are you going to quit driving to work then?
#Person2#: Yes, it’s not good for me or for the environment. Output:

--------------------------------------------------------------------------------------------------- BASELINE HUMAN SUMMARY:
#Person2# complains to #Person1# about the traffic jam, #Person1# suggests quitting driving and taking public transportation instead.

--------------------------------------------------------------------------------------------------- PEFT MODEL:
#Person2# got stuck in traffic again and #Person1# suggests #Person2# should take public transport system to work. #Person2# agrees and will
start biking to work.

#Person1# and #Person2# are talking about #Person2#'s traffic jam.
#Person1# suggests #Person2# should take public transport system to work.
#Person2# agrees and will start biking to work.

#Person1# and

量化评估模型（使用ROUGE度量）

ROUGE，或面向回忆的Gisting Evaluation Understudy，是一组指标和软件包，用于评估自然语言处理中的自动摘要和机器翻译软件。

度量将自动生成的摘要或翻译与参考文献或参考文献集（人工生成）摘要或翻译进行比较。

加载模型

original_model = AutoModelForCausalLM.from_pretrained(base_model_id, 
                                                      device_map='auto',
                                                      quantization_config=bnb_config,
                                                      trust_remote_code=True,
                                                      use_auth_token=True)

import pandas as pd

dialogues = dataset['test'][0:10]['dialogue']
human_baseline_summaries = dataset['test'][0:10]['summary']

original_model_summaries = []
instruct_model_summaries = []
peft_model_summaries = []

#验证


for idx, dialogue in enumerate(dialogues):
    human_baseline_text_output = human_baseline_summaries[idx]
    prompt = f"Instruct: Summarize the following conversation.\n{
      
      dialogue}\nOutput:\n"
    
    original_model_res = gen(original_model,prompt,100,)
    original_model_text_output = original_model_res[0].split('Output:\n')[1]
    
    peft_model_res = gen(ft_model,prompt,100,)
    peft_model_output = peft_model_res[0].split('Output:\n')[1]
    print(peft_model_output)
    peft_model_text_output, success, result = peft_model_output.partition('###')

    original_model_summaries.append(original_model_text_output)
    peft_model_summaries.append(peft_model_text_output)

zipped_summaries = list(zip(human_baseline_summaries, original_model_summaries, peft_model_summaries))
 
df = pd.DataFrame(zipped_summaries, columns = ['human_baseline_summaries', 'original_model_summaries', 'peft_model_summaries'])

返回结果的部分样例
在这里插入图片描述

验证

import evaluate
rouge = evaluate.load('rouge')

original_model_results = rouge.compute(
    predictions=original_model_summaries,
    references=human_baseline_summaries[0:len(original_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)

peft_model_results = rouge.compute(
    predictions=peft_model_summaries,
    references=human_baseline_summaries[0:len(peft_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)

结果打印

print('ORIGINAL MODEL:')
print(original_model_results)
print('PEFT MODEL:')
print(peft_model_results)

print("Absolute percentage improvement of PEFT MODEL over ORIGINAL MODEL")

ORIGINAL MODEL: {‘rouge1’: 0.2982172181627305, ‘rouge2’:
0.10884949396933342, ‘rougeL’: 0.2265921122820806, ‘rougeLsum’: 0.2370444112189944} PEFT MODEL: {‘rouge1’: 0.3454568975970058, ‘rouge2’: 0.10093065901837833, ‘rougeL’: 0.27296572386050966,
‘rougeLsum’: 0.2632051713333925} Absolute percentage improvement of
PEFT MODEL over ORIGINAL MODEL

rouge的变化

improvement = (np.array(list(peft_model_results.values())) - np.array(list(original_model_results.values())))
for key, value in zip(peft_model_results.keys(), improvement):
    print(f'{
      
      key}: {
      
      value*100:.2f}%')

rouge1: 4.72%
rouge2: -0.79%
rougeL: 4.64%
rougeLsum: 2.62%

以上是本次样例的全部内容，如果有疑问可以留言交流