SmolLM2 是一个紧凑型语言模型系列,有三种尺寸:135M、360M 和 1.7B 参数。 它们既能解决各种任务,又足够轻便,可以在设备上运行。
1.7B 变体与其前身 SmolLM1-1.7B 相比有显著进步,尤其是在指令遵循、知识、推理和数学方面。 它使用不同的数据集组合在 11 万亿个词块上进行训练: FineWeb-Edu, DCLM, The Stack,以及我们策划并即将发布的新数学和编码数据集。 我们结合使用公共数据集和我们自己策划的数据集,通过监督微调(SFT)开发了指导版本。 然后,我们使用 UltraFeedback 进行了直接偏好优化 (DPO)。
借助 Argilla 开发的 Synth-APIGen-v0.1 等数据集,指导模型还支持文本改写、摘要和函数调用等任务。
基础预训练模型
Metric | SmolLM2-1.7B | Llama-1B | Qwen2.5-1.5B | SmolLM1-1.7B |
---|---|---|---|---|
HellaSwag | 68.7 | 61.2 | 66.4 | 62.9 |
ARC (Average) | 60.5 | 49.2 | 58.5 | 59.9 |
PIQA | 77.6 | 74.8 | 76.1 | 76.0 |
MMLU-Pro (MCF) | 19.4 | 11.7 | 13.7 | 10.8 |
CommonsenseQA | 43.6 | 41.2 | 34.1 | 38.0 |
TriviaQA | 36.7 | 28.1 | 20.9 | 22.5 |
Winogrande | 59.4 | 57.8 | 59.3 | 54.7 |
OpenBookQA | 42.2 | 38.4 | 40.0 | 42.4 |
GSM8K (5-shot) | 31.0 | 7.2 | 61.3 | 5.5 |
指令模型
Metric | SmolLM2-1.7B-Instruct | Llama-1B-Instruct | Qwen2.5-1.5B-Instruct | SmolLM1-1.7B-Instruct |
---|---|---|---|---|
IFEval (Average prompt/inst) | 56.7 | 53.5 | 47.4 | 23.1 |
MT-Bench | 6.13 | 5.48 | 6.52 | 4.33 |
OpenRewrite-Eval (micro_avg RougeL) | 44.9 | 39.2 | 46.9 | NaN |
HellaSwag | 66.1 | 56.1 | 60.9 | 55.5 |
ARC (Average) | 51.7 | 41.6 | 46.2 | 43.7 |
PIQA | 74.4 | 72.3 | 73.2 | 71.6 |
MMLU-Pro (MCF) | 19.3 | 12.7 | 24.2 | 11.7 |
BBH (3-shot) | 32.2 | 27.6 | 35.3 | 25.7 |
GSM8K (5-shot) | 48.2 | 26.8 | 42.8 | 4.62 |
Demo
pip install transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "HuggingFaceTB/SmolLM2-1.7B-Instruct"
device = "cuda" # for GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
# for multiple GPUs install accelerate and do `model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto")`
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
messages = [{
"role": "user", "content": "What is the capital of France."}]
input_text=tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=50, temperature=0.2, top_p=0.9, do_sample=True)
print(tokenizer.decode(outputs[0]))
Chat in TRL
pip install trl
trl chat --model_name_or_path HuggingFaceTB/SmolLM2-1.7B-Instruct --device cpu
基础预训练模型
Metrics | SmolLM2-360M | Qwen2.5-0.5B | SmolLM-360M |
---|---|---|---|
HellaSwag | 54.5 | 51.2 | 51.8 |
ARC (Average) | 53.0 | 45.4 | 50.1 |
PIQA | 71.7 | 69.9 | 71.6 |
MMLU (cloze) | 35.8 | 33.7 | 34.4 |
CommonsenseQA | 38.0 | 31.6 | 35.3 |
TriviaQA | 16.9 | 4.3 | 9.1 |
Winogrande | 52.5 | 54.1 | 52.8 |
OpenBookQA | 37.4 | 37.4 | 37.2 |
GSM8K (5-shot) | 3.2 | 33.4 | 1.6 |
指令模型
Metric | SmolLM2-360M-Instruct | Qwen2.5-0.5B-Instruct | SmolLM-360M-Instruct |
---|---|---|---|
IFEval (Average prompt/inst) | 41.0 | 31.6 | 19.8 |
MT-Bench | 3.66 | 4.16 | 3.37 |
HellaSwag | 52.1 | 48.0 | 47.9 |
ARC (Average) | 43.7 | 37.3 | 38.8 |
PIQA | 70.8 | 67.2 | 69.4 |
MMLU (cloze) | 32.8 | 31.7 | 30.6 |
BBH (3-shot) | 27.3 | 30.7 | 24.4 |
GSM8K (5-shot) | 7.43 | 26.8 | 1.36 |
Demo
pip install transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "HuggingFaceTB/SmolLM2-360M-Instruct"
device = "cuda" # for GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
# for multiple GPUs install accelerate and do `model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto")`
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
messages = [{
"role": "user", "content": "What is the capital of France."}]
input_text=tokenizer.apply_chat_template(messages, tokenize=False)
print(input_text)
inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=50, temperature=0.2, top_p=0.9, do_sample=True)
print(tokenizer.decode(outputs[0]))
Chat in TRL
pip install trl
trl chat --model_name_or_path HuggingFaceTB/SmolLM2-360M-Instruct --device cpu