My name is Yuancheng Wang (็่ฟ็จ). I'm a first-year Ph.D. student at the Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), supervised by Professor Zhizheng Wu. before that, I received my B.S. degree at CUHK-Shenzhen. I also collaborate with Xu Tan (่ฐญๆญ) from Microsoft Research Asia.
My research interest includes text-to-speech synthesis, text-to-audio generation, and unified audio representation and generation. I am one of the main contributors and leaders of the open-sourceย Amphionย toolkit.
I have developed NaturalSpeech 3, which is an advanced text-to-speech model with factorized speech representation and modeling.
- 2024.09: ๐ฅ We released MaskGCT, A new SOTA large-scale TTS system with masked generative models.
- 2024.08: ๐ our papers, Amphion and Emilia got accepted by IEEE SLT 2024.
- 2024.07: ๐ฅ We released Emilia, an extensive, multilingual, and diverse speech dataset for large-scale speech generation with 101k hours of speech in six languages and features diverse speech with varied speaking styles.
- 2024.05: ๐ Our paper Factorized Diffusion Models are Natural and Zero-shot Speech Synthesizers, aka NaturalSpeech 3, got accepted by ICML 2024 as an Oral presentation!
- 2024.03: ๐ We are delighted to release NaturalSpeech 3, which is an advanced version of the NaturalSpeech series with speech factorization. And we release FACodec checkpoints and demo in HuggingFace Amphion Space.
- 2023.11: ๐ฅ We releasedย Amphion v0.1 (โญ๏ธ 4.4k+), which is an open-source toolkit for audio, music, and speech generation.
- 2023.09: ๐ My first paper about audio generation and editing AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models got accepted by NeurIPS 2023!