Absolute Zero: Reinforced Self-play Reasoning with Zero Data

learn to reason without any human-annotated data.
learn to reason without any human-annotated data.
a classic paper applying neural networks to RL for game playing
A mathematical framework for modelling decision-making under uncertainty