Home /permanent

DeepSeek-R1-Zero

From the DeepSeek-R1 Reasoning via Reinforcement Learning paper, a model trained to reason without explicitly seeing reasoning traces.