Microsoft develops eerily realistic AI voice generator, but keeps it under wraps

The Sways

3 months ago

This AI marvel can mimic human speech with astonishing accuracy using just a few seconds of audio.

Microsoft creates AI that replicates 'exact voice' of humans - but it's too dangerous to release - Mirror Online

While often associated with flashy releases and wide availability, advancements in AI are increasingly forcing tech giants to tread carefully. Microsoft’s latest innovation, VALL-E 2, is a prime example of this trend. This AI marvel can mimic human speech with astonishing accuracy using just a few seconds of audio, marking a significant leap in text-to-speech (TTS) technology.

In a move that highlights the growing ethical concerns around advanced AI, Microsoft has developed a remarkably realistic text-to-speech system, VALL-E 2, but has chosen to keep it under wraps due to potential misuse.

VALL-E 2 is the first voice AI to reach human parity in speech robustness, naturalness, and speaker similarity,” the Microsoft researchers proudly declare. This “human parity” means that AI-generated speech is nearly indistinguishable from a real person’s voice.

So, what makes VALL-E 2 so believable?

Two key features contribute to its realism. “Repetition Aware Sampling” allows the AI to avoid the monotonous repetition often found in TTS systems by intelligently addressing repeated words or syllables, making the speech flow more naturally. Secondly, “Grouped Code Modeling” boosts efficiency by processing shorter sound sequences, speeding up speech generation and handling long, complex audio strings.

Fears of misuse overshadow potential.

Despite these concerns, Microsoft remains optimistic about the future of AI speech technology. The researchers envision safe and ethical applications where synthesised speech retains speaker identity with proper consent and robust detection mechanisms.

Despite its vast potential in education, entertainment, accessibility, and more, Microsoft has opted to keep VALL-E 2 under tight control. The company cites concerns about potential misuse, particularly regarding voice identification spoofing and convincing impersonations.

This groundbreaking research has been detailed in a pre-print paper, offering a glimpse into the future of AI while raising crucial questions about its responsible development and deployment.