Vall-e voice AI microsoft

Microsoft’s Latest AI Innovation VALL-E Can Recreate Your Voice in 3 Seconds

2 min read

This article is for general information purposes only and isn’t intended to be financial product advice. You should always obtain your own independent advice before making any financial decisions. The Chainsaw and its contributors aren’t liable for any decisions based on this content.



Microsoft has created a new artificial intelligence (AI) system called VALL-E. This AI can recreate your voice based on a three-second clip you provide to it. Three seconds is all it needs to sound just like you.

What this means is that VALL-E can make voice clips that sound exactly like you, even if you never actually said those words.

The AI can also preserve your emotional tone, and the sounds of the environment you were in when you spoke the original voice sample.

We seem to be in very creepy territory here with the AI offerings coming on stream.

There are a lot of potential uses for VALL-E, including creating text-to-speech recordings without having to go to a recording studio, creating whole podcasts without anything but text, editing recordings of people’s voices to add words to make them sound better, and creating new social media content when combined with other AI systems.

We could also attribute speech to someone who said no such thing.

While you can’t use it fully yet in a real way, you can go to the site and see some demonstrations of how it works.

VALL-E voice AI

In the future, people who have their social media accounts managed by others will be able to shift a lot of work to their assistants.

There are darker things to think about with the introduction of this technology. Criminals can be identified by their voices and dragged to court, so this process could be in jeopardy.

There are other scenarios. If you use your voice as identification for any of your tech, your security could be in trouble. One sneakily recorded voice clip and your spurned lover could access your devices.

Vall-e voice AI microsoft

How to use VALL-E

VALL-E works differently than other text-to-speech systems — it creates something called “acoustic tokens” from text and audio samples, rather than just manipulating waveforms. These tokens are like building blocks that represent how a person’s voice sounds. VALL-E uses them to create new audio that sounds which mimic the original speaker much better.

So why can’t we, the unwashed masses, use the site yet? It seems that Microsoft knows people will use it for nefarious reasons. 

Microsoft shared that VALL-E could very much carry potential risks in misuse of the model, such as spoofing voice identification or impersonating a specific speaker, but says it is working on mitigation strategies.  

“To mitigate such risks, it is possible to build a detection model to discriminate whether an audio clip was synthesised by VALL-E. We will also put Microsoft AI Principles into practice when further developing the models.”

As always, it’s a brave new world, and things are definitely getting weird.