VALL-E: Microsoft's new zero-shot text-to-speech model can duplicate everyone's voice in three seconds (Damir Yalalov/Metaverse Post) 10-01-2023

Damir Yalalov / Metaverse Post:
VALL-E: Microsoft’s new zero-shot text-to-speech model can duplicate everyone’s voice in three seconds — Since the release of the first text-to-speech (TTS) model, researchers have been looking for ways to improve the way these systems generate speech. The latest model from Microsoft …