Absolute Zero: Reinforced Self-play Reasoning with Zero Data

learn to reason without any human-annotated data.
learn to reason without any human-annotated data.
a classic paper applying neural networks to RL for game playing
A mathematical framework for modelling decision-making under uncertainty
a distribution-based sorting algorithm that works by dividing elements into buckets
Routes LLM tasks to cheaper or more powerful models based on task novelty.