Neural Codec Language Models are Zero-Shot Text-to-Speech Synthesizers

VALL-E can generate speech in anyone's voice with only a 3-second sample of the speaker and some text
VALL-E can generate speech in anyone's voice with only a 3-second sample of the speaker and some text
An activation function for modelling data with periodicity (repeating patterns)