Zero-Shot TTS

Zero-shot text-to-speech models takes some text and sample audio of the target speaker and can generate new speech by that speaker.