Mixtral of Experts (Jan 2024)
a Sparse Mixture of Experts (SMoE) language model
a Sparse Mixture of Experts (SMoE) language model
a comprehensive evaluation of o1-preview across many tasks and domains.
LLMs can help and also hinder learning outcomes
System 2 thinking is characterised by slow, deliberate, and logical reasoning, requiring conscious effort and attention to solve complex problems. Unlike the intuitive nature of …
a paper that shows a model needs to see a concept exponentially more times to achieve linear improvements
an approach to utilising LLMs that involve multi-state interactions.
A data visualization that uses squares along a 2D grid for representing proportion.
The specific self-attention formulation from the Transformer paper, distinguished by scaling scores by the square root of the attention dimension.