Learning to Reason without External Rewards
aka Self-Confidence is All You Need
aka Self-Confidence is All You Need
learn to reason without any human-annotated data.
a classic paper applying neural networks to RL for game playing
a reinforcement learning algorithm for finding optimal policies
A mathematical framework for modelling decision-making under uncertainty