模型推理增强微调

原文链接:https://i68.ltd/notes/posts/20250304-llm-fine-tuning-reason/

知识融合FuseAI

  • 能融合多个模型,降低训练成本,提升推理性能
  • 论文链接:[2408.07990] FuseChat: Knowledge Fusion of Chat Models
  • 项目仓库:FuseAI
  • FuseO1智商确实不错,高数和计算机系统结构的刷题智力是真的强
  • 网友评论:fuse o1 32b刷穿了我的测试题库,r1-70b的4bit awq都刷不穿我的题库

LIMO: Less is More for Reasoning

Rethinking Compute-Optimal Test-Time Scaling

Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning

TinyR1-32B-Preview