Cognitive Behaviors That Enable Self-Improving Reasoners
Cognitive Behaviors That Enable Self-Improving Reasoners
arxiv.org Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs
Test-time inference has emerged as a powerful paradigm for enabling language models to ``think'' longer and more carefully about complex challenges, much like skilled human experts. While reinforcement learning (RL) can drive self-improvement in language models on verifiable tasks, some models exhib...

0 comments