VALL-E Artificial Intelligence Can Generate Voice Based On 3 Second Clip Samples

 


DALL.E's artificial intelligence can generate images based on text input provided by the user. Now VALL-E artificial intelligence developed by Microsoft has the ability to generate a voice audio clip based on a person's audio sample for only 3 seconds.



Microsoft says the technology can be used to generate human-quality audio clips for text-to-speech applications. VALL-E was trained for 60,000 hours using a LibrilLight audio library collected by Meta consisting of voice recordings of 7000 individuals reading LibriVox audiobooks.


Aware of the various issues that may arise if VALL-E is used to produce fake audio based on a specific individual's voice, Microsoft does not provide open access for it to be used by the public. It is currently only accessible to researchers at Microsoft. VALL-E is still being developed to improve the accuracy of it's voicing a given sample.

Previous Post Next Post

Contact Form