原文链接:https://i68.ltd/notes/posts/20250304-llm-fine-tuning-reason/
知识融合FuseAI
- 能融合多个模型,降低训练成本,提升推理性能
- 论文链接:[2408.07990] FuseChat: Knowledge Fusion of Chat Models
- 项目仓库:FuseAI
- FuseO1智商确实不错,高数和计算机系统结构的刷题智力是真的强
- 网友评论:fuse o1 32b刷穿了我的测试题库,r1-70b的4bit awq都刷不穿我的题库
LIMO: Less is More for Reasoning
- 论文地址:LIMO: Less is More for Reasoning
- 项目地址:https://github.com/GAIR-NLP/LIMO
- 颠覆传统!比DeepSeek R1更省资源 | LIMO模型用少量数据实现高效推理,超越SFT极限
Rethinking Compute-Optimal Test-Time Scaling
- 论文地址:https://arxiv.org/pdf/2502.06703
- 项目地址:https://github.com/RyanLiu112/compute-optimal-tts
- 1B小模型完胜405B巨无霸!上海AILab新突破
- 通过TTS策略,小型语言模型(LLM)具备显著超越大型模型的潜力
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning
- 使用基于规则的强化学习释放LLM推理
- 论文地址:https://arxiv.org/pdf/2502.14768
- 项目仓库:https://github.com/Unakar/Logic-RL
- REINFORCE++逆袭Deepseek的GRPO!微软团队用逻辑谜题揭示大模型顿悟时刻
- 仅靠逻辑题,AI数学竞赛能力飙升!微软、九坤投资:7B小模型也能逼近o3-mini
TinyR1-32B-Preview
- 结合DeepSeek-R1蒸馏、DeepSeek-R1-Distill-32B增量训练、模型融合等技术,使用360-LLaMA-Factory训练而来
- 360联合北大震撼发布!5%参数量逼近Deepseek-R1满血性能
- Tiny-R1-32B-Preview
- OpenAI o1复现——360 LLaMA Factory 训练超长思维链
- 360-LLaMA-Factory