SmolLM2:适用于设备上应用的新型最佳小型模型

在这里插入图片描述

SmolLM2 是一个紧凑型语言模型系列,有三种尺寸:135M、360M 和 1.7B 参数。 它们既能解决各种任务,又足够轻便,可以在设备上运行。

在这里插入图片描述

1.7B 变体与其前身 SmolLM1-1.7B 相比有显著进步,尤其是在指令遵循、知识、推理和数学方面。 它使用不同的数据集组合在 11 万亿个词块上进行训练: FineWeb-Edu, DCLM, The Stack,以及我们策划并即将发布的新数学和编码数据集。 我们结合使用公共数据集和我们自己策划的数据集,通过监督微调(SFT)开发了指导版本。 然后,我们使用 UltraFeedback 进行了直接偏好优化 (DPO)。

借助 Argilla 开发的 Synth-APIGen-v0.1 等数据集,指导模型还支持文本改写、摘要和函数调用等任务。

基础预训练模型

Metric SmolLM2-1.7B Llama-1B Qwen2.5-1.5B SmolLM1-1.7B
HellaSwag 68.7 61.2 66.4 62.9
ARC (Average) 60.5 49.2 58.5 59.9
PIQA 77.6 74.8 76.1 76.0
MMLU-Pro (MCF) 19.4 11.7 13.7 10.8
CommonsenseQA 43.6 41.2 34.1 38.0
TriviaQA 36.7 28.1 20.9 22.5
Winogrande 59.4 57.8 59.3 54.7
OpenBookQA 42.2 38.4 40.0 42.4
GSM8K (5-shot) 31.0 7.2 61.3 5.5

指令模型

Metric SmolLM2-1.7B-Instruct Llama-1B-Instruct Qwen2.5-1.5B-Instruct SmolLM1-1.7B-Instruct
IFEval (Average prompt/inst) 56.7 53.5 47.4 23.1
MT-Bench 6.13 5.48 6.52 4.33
OpenRewrite-Eval (micro_avg RougeL) 44.9 39.2 46.9 NaN
HellaSwag 66.1 56.1 60.9 55.5
ARC (Average) 51.7 41.6 46.2 43.7
PIQA 74.4 72.3 73.2 71.6
MMLU-Pro (MCF) 19.3 12.7 24.2 11.7
BBH (3-shot) 32.2 27.6 35.3 25.7
GSM8K (5-shot) 48.2 26.8 42.8 4.62

Demo

pip install transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "HuggingFaceTB/SmolLM2-1.7B-Instruct"

device = "cuda" # for GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
# for multiple GPUs install accelerate and do `model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto")`
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

messages = [{
    
    "role": "user", "content": "What is the capital of France."}]
input_text=tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=50, temperature=0.2, top_p=0.9, do_sample=True)
print(tokenizer.decode(outputs[0]))

Chat in TRL

pip install trl
trl chat --model_name_or_path HuggingFaceTB/SmolLM2-1.7B-Instruct --device cpu

基础预训练模型

Metrics SmolLM2-360M Qwen2.5-0.5B SmolLM-360M
HellaSwag 54.5 51.2 51.8
ARC (Average) 53.0 45.4 50.1
PIQA 71.7 69.9 71.6
MMLU (cloze) 35.8 33.7 34.4
CommonsenseQA 38.0 31.6 35.3
TriviaQA 16.9 4.3 9.1
Winogrande 52.5 54.1 52.8
OpenBookQA 37.4 37.4 37.2
GSM8K (5-shot) 3.2 33.4 1.6

指令模型

Metric SmolLM2-360M-Instruct Qwen2.5-0.5B-Instruct SmolLM-360M-Instruct
IFEval (Average prompt/inst) 41.0 31.6 19.8
MT-Bench 3.66 4.16 3.37
HellaSwag 52.1 48.0 47.9
ARC (Average) 43.7 37.3 38.8
PIQA 70.8 67.2 69.4
MMLU (cloze) 32.8 31.7 30.6
BBH (3-shot) 27.3 30.7 24.4
GSM8K (5-shot) 7.43 26.8 1.36

Demo

pip install transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "HuggingFaceTB/SmolLM2-360M-Instruct"

device = "cuda" # for GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
# for multiple GPUs install accelerate and do `model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto")`
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

messages = [{
    
    "role": "user", "content": "What is the capital of France."}]
input_text=tokenizer.apply_chat_template(messages, tokenize=False)
print(input_text)
inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=50, temperature=0.2, top_p=0.9, do_sample=True)
print(tokenizer.decode(outputs[0]))

Chat in TRL

pip install trl
trl chat --model_name_or_path HuggingFaceTB/SmolLM2-360M-Instruct --device cpu

猜你喜欢

转载自blog.csdn.net/weixin_41446370/article/details/143445499