April 15, 2026Teaching LLMs When to Think: Self-Routing Think/No-Think via RLA training framework that teaches LLMs to autonomously decide whether to use chain-of-thought reasoning or answer directly, via paired counterfactual rollouts and curriculum-based RL.LLMReinforcement LearningChain-of-ThoughtAdaptive Compute