Blog | Ruochen Jiao

April 15, 2026

Teaching LLMs When to Think: Self-Routing Think/No-Think via RL

A training framework that teaches LLMs to autonomously decide whether to use chain-of-thought reasoning or answer directly, via paired counterfactual rollouts and curriculum-based RL.

LLMReinforcement LearningChain-of-ThoughtAdaptive Compute