Sparse Mixture of Experts Model

Sparse Mixture of Experts Model are models with a router component that sends request to a subset of layers. Mixtral 8x7B is an example.

Sparse Mixture of Experts Model are models with a router component that sends request to a subset of layers. Mixtral 8x7B is an example.