OpenAI’s new Voice Engine clones human speech: How this poses a danger, what the company claims

Voice Engine can also produce audio clips of the same speaker in another language. How does OpenAI plan to deal with its potential misuse and how can it be used for good?

Representational/ OpenAI's new Voice Engine.

Simply, Voice Engine is a text-to-audio tool that can generate new audio using a voice sample, based on a user’s written prompt. (Image via vectorjuice/Freepik)

OpenAI, the company behind AI chatbot ChatGPT, introduced a new AI model last week (March 29) that can replicate any voice in any language by simply using a brief audio sample. Known as Voice Engine, it lets users upload a 15-second audio sample and uses it to generate audio in the same voice and manner of speaking.

For example, if you upload a man’s voice sample and add a text prompt saying, “Make him sing the American national anthem,” the model will analyse the sample and create an output in his voice. It can also produce audio clips of the same speaker in another language.

However, the company has yet to release it for public use, reportedly over safety issues. We explain why.

What is OpenAI’s new Voice Engine?

Simply, Voice Engine is a text-to-audio tool that can generate new audio using a voice sample, based on a user’s written prompt. Voice Engine is also capable of replicating voices across languages.

OpenAI said it has been working on this model since late 2022 and reportedly kept mum about it due to safety concerns until March 29, when they made small previews available.

3 reasons why Voice Engine is groundbreaking

According to OpenAI, it can prove helpful in the following ways:

1. Providing reading assistance: This feature makes non-readers and children learn or understand languages through natural-sounding, emotive voices representing a variety of speakers.

Explained | OpenAI launches Sora: How AI can create videos from a text prompt

OpenAI shared the example of an education technology provider, Age of Learning, which has been using pre-scripted voice-over content. The company also uses Voice Engine and GPT-4 to create real-time and personalised responses to engage students.

2. Translating content: For professionals and content creators, this could be the most useful feature of the Voice Engine as it allows them to reach wider audiences. Crucially, it retains the accent and tonal nuances of the original user even while translating the audio content. This adds to the realism of the output.

The samples shared on the website demonstrate the fluency of translation it possesses. The multilingual nature of the model makes it ideal for global applications. OpenAI shared the example of HeyGen, an AI visual storytelling platform that creates human-like avatars for a variety of content. The platform uses Voice Engine for translation.

3. Supporting people who are non-verbal, helping patients recover their voice: Among the listed use cases, Voice Engine is claimed to be beneficial for non-verbal individuals since it offers personalised, non-robotic voices. This will enable those with disabilities or learning needs to communicate easily and consistently, even in multilingual contexts.

OpenAI said that Livox, an alternative communication app that enables people with disabilities to communicate, is using Voice Engine. Similarly, the US-based Norman Prince Neurosciences Institute is pioneering AI’s clinical applications by restoring voices for patients with speech impairments due to neurological or oncological conditions.

They use Voice Engine technology to recreate a patient’s voice from a brief video clip, offering new hope and possibilities for individuals facing speech challenges.

Why is Voice Engine seen as potentially dangerous?

There have been numerous instances where AI voice cloning has been used to dupe people. These AI tools use deep learning algorithms to analyse voice samples to create speech. Owing to refinement in models over time, they generate realistic voices, making them dangerous tools.

This year, over 60 nations (including India) will go into elections. Deepfakes and AI cloning tools are already being used by malicious actors, who assume the identities of popular figures to sway political sentiments.

Read | How AI can accelerate production and dissemination of election disinformation

The new Voice Engine is likely to get many companies to rush to update their platforms, even while the new technology’s risks are apparent. The absence of legal regulations around such powerful technologies is of concern.

What has OpenAI said?

Several users on X (formerly Twitter) spoke of the potential misuse of the technology. One user pointed out how cloning public figures’ voices could lead to several lawsuits against OpenAI. Meredith Whittaker, President of the messaging app Signal, said that the risks of voice AI cloning are not new and the benefits do not justify the risks.

Voice cloning AI isn’t new + the risks have been known for ages, as has the fact that the “benefits” don’t justify the risks

Cynically, this looks like a performance of “judicious caution” meant to give MS execs an e.g. of “self regulation working” they can reference if pressed. https://t.co/QcZs0n8vnf

— Meredith Whittaker (@mer__edith) March 31, 2024

OpenAI has acknowledged the risks involved and said it is already working with various stakeholders.

“We recognize that generating speech that resembles people’s voices has serious risks, which are especially top of mind in an election year. We are engaging with US and international partners from across government, media, entertainment, education, civil society and beyond to ensure we are incorporating their feedback as we build,” said the company.

It added its partners have agreed to its usage policies, which “prohibit the impersonation of another individual or organization without consent or legal right.”

“Partners must also clearly disclose to their audience that the voices they’re hearing are AI-generated. Finally, we have implemented a set of safety measures, including watermarking to trace the origin of any audio generated by Voice Engine, as well as proactive monitoring of how it’s being used,” it added.

OpenAI also hoped to start a “dialogue” on the responsible public deployment of such technology. It suggested, among other things, a “no-go voice list that detects and prevents the creation of voices that are too similar to prominent figures.”

However, worries over potential misuse are associated with most new AI technology being rolled out. Like Voice Engine, OpenAI is building some of the most powerful AI tools in line with its vision to create Artificial General Intelligence (AGI). Reportedly, OpenAI is partnering with filmmakers to create films using Sora, its AI tool that can generate videos based on text prompts.

Bijin Jose

Bijin Jose, an Assistant Editor at Indian Express Online in New Delhi, is a technology journalist with a portfolio spanning various prestigious publications. Starting as a citizen journalist with The Times of India in 2013, he transitioned through roles at India Today Digital and The Economic Times, before finding his niche at The Indian Express. With a BA in English from Maharaja Sayajirao University, Vadodara, and an MA in English Literature, Bijin's expertise extends from crime reporting to cultural features. With a keen interest in closely covering developments in artificial intelligence, Bijin provides nuanced perspectives on its implications for society and beyond. ... Read More

First uploaded on: 04-04-2024 at 16:20 IST

Maharashtra horizon cloudy, BJP set to bring back senior leader who left blaming Fadnavis

Political PulseUpdated: April 4, 2024 18:35 IST

Eknath Khadse, a senior leader of the NCP Sharad Pawar faction, is set to return to the BJP ahead of the Lok Sabha elections. Khadse, who has been with the Sangh Parivar for 40 years, had quit the BJP in 2020 and joined the NCP. The BJP hopes his return will help quell discontent in key constituencies in North Maharashtra.

View all shorts