Chain-of-Thought Reasoning

Chain-of-Thought Reasoning is an LLM Reasoning technique where the model can reasoning in token space. Originally described as a prompting technique Chain-of-Thought Prompting where the model was given few-shot examples of input / output examples with intermediary reasoning but later with the introduction of models like OpenAI's o1 and DeepSeek-R1-Zero, allowed the models to perform reasoning without few-shot examples, either by learning to reasoning as a fine-tuning step (by adding reasoning steps into training data) or via reinforcement learning, where the model was rewarded for applying a thinking process before returning an output.