Neural Codec Language Models are Zero-Shot Text-to-Speech Synthesizers
VALL-E can generate speech in anyone's voice with only a 3-second sample of the speaker and some text
VALL-E can generate speech in anyone's voice with only a 3-second sample of the speaker and some text
Notes from paper Large-scale Contrastive Language-Audio Pre-training with Feature Fusion and Keyword-to-Caption Augmentation by Yusong Wu, Ke Chen, Tianyu Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, Shlomo Dubnov
Notes from paper A Discriminative Feature Learning Approach for Deep Face Recognition by Yandong Wen, Kaipeng Zhang, Zhifeng Li, and Yu Qiao
Notes from paper ArcFace: Additive Angular Margin Loss for Deep Face Recognition by Jiankang Deng, Jia Guo, Niannan Xue, Stefanos Zafeiriou
Notes from 3Blue1Brown's video series, Essence of Linear Algebra
Notes from Coursera's Mathematics for Machine Learning: Linear Algebra by Imperial College London
Notes from the Khan Academy video series on the Law of Cosines
Notes from the Deep Learning for Coders (2020) video series by Jeremy Howard and Sylvain Gugger (fast.ai)