OpenAI is providing restricted entry to a text-to-voice era platform it developed referred to as Voice Engine, which might create an artificial voice based mostly on a 15-second clip of somebody’s voice. The AI-generated voice can learn out textual content prompts on command in the identical language because the speaker or in plenty of different languages. “These small scale deployments are helping to inform our approach, safeguards, and thinking about how Voice Engine could be used for good across various industries,” OpenAI stated in its weblog submit.
Firms with entry embody the training know-how firm Age of Studying, visible storytelling platform HeyGen, frontline well being software program maker Dimagi, AI communication app creator Livox, and well being system Lifespan.
In these samples posted by OpenAI, you’ll be able to hear what Age of Studying has been doing with the know-how to generate pre-scripted voice-over content material, in addition to studying out “real-time, personalized responses” to college students written by GPT-4.
First, the reference audio in English:
And listed below are three AI-generated audio clips based mostly on that pattern,
OpenAI stated it started creating Voice Engine in late 2022 and that the know-how has already powered preset voices for the text-to-speech API and ChatGPT’s Learn Aloud function. In an interview with TechCrunch, Jeff Harris, a member of OpenAI’s product crew for Voice Engine, stated the mannequin was skilled on “a mix of licensed and publicly available data.” OpenAI advised the publication the mannequin will solely be accessible to about 10 builders.
AI text-to-audio era is an space of generative AI that’s persevering with to evolve. Whereas most concentrate on instrumental or pure sounds, fewer have targeted on voice era, partially because of the questions OpenAI cited. Some names within the area embody firms like Podcastle and ElevenLabs, which offer AI voice cloning know-how and instruments the Vergecast explored final 12 months.
In line with OpenAI, its companions agreed to abide by its utilization insurance policies that say they won’t use Voice Era to impersonate folks or organizations with out their consent. It additionally requires the companions to get the “explicit and informed consent” of the unique speaker, not construct methods for particular person customers to create their very own voices, and to speak in confidence to listeners that the voices are AI-generated. OpenAI additionally added watermarking to the audio clips to hint their origin and actively monitor how the audio is used.
OpenAI instructed a number of steps that it thinks may restrict the dangers round instruments like these, together with phasing out voice-based authentication to entry financial institution accounts, insurance policies to guard using folks’s voices in AI, larger training on AI deepfakes, and improvement of monitoring programs of AI content material.