Open AI introduces ChatGPT’s lifelike voice feature to select paying users

0

 


OpenAI began rolling out ChatGPT’s Advanced Voice Mode on Tuesday, offering users their first glimpse of GPT-4o’s hyper-realistic audio responses. This alpha version will be available today to a select group of ChatGPT Plus users, with plans to gradually extend access to all Plus subscribers by fall 2024.


When OpenAI first demonstrated GPT-4o’s voice in May, it stunned audiences with its rapid responses and striking resemblance to a real human’s voice. Notably, the voice named Sky bore a resemblance to Scarlett Johansson’s voice, prompting Johansson to hire legal counsel after OpenAI's demo, despite the company denying any use of her likeness. Subsequently, OpenAI removed the Sky voice from its demo and announced a delay to enhance safety measures for the Advanced Voice Mode.

Now, after a month of waiting, the feature is partially available. However, OpenAI has noted that video and screen-sharing capabilities from the Spring Update will not be part of this alpha release and will launch at a later date. For the moment, the impressive GPT-4o demo remains a demo, but some premium users can now experience ChatGPT’s new voice feature.


ChatGPT’s previous voice solution used three separate models: one for converting voice to text, GPT-4 for processing prompts, and another for converting text to voice. In contrast, GPT-4o integrates these functions into a single multimodal model, allowing for significantly lower latency conversations. It also has the capability to detect emotional tones such as sadness, excitement, or even singing.

During this pilot phase, ChatGPT Plus users will be among the first to experience the advanced voice capabilities. TechCrunch has yet to test the feature but will provide a review once access is granted.

OpenAI is gradually releasing the new voice feature to monitor its usage closely. Users in the alpha group will receive an alert in the ChatGPT app, followed by an email with instructions on how to use it.

Since the May demo, OpenAI has tested GPT-4o’s voice with over 100 external testers who speak 45 different languages. A report on these safety measures is expected in early August. The Advanced Voice Mode will offer four preset voices—Juniper, Breeze, Cove, and Ember—developed in collaboration with professional voice actors. The Sky voice shown in the May demo is no longer available. OpenAI spokesperson Lindsay McCallum emphasized that ChatGPT will not impersonate other individuals’ voices and will block outputs that do not align with the preset voices.

To avoid controversies related to deepfakes, OpenAI has introduced new filters to block requests for generating copyrighted music or audio. This measure follows recent legal challenges faced by AI companies over copyright infringement, with record labels being particularly proactive in filing complaints against AI-generated content.

Tags

Post a Comment

0Comments

Post a Comment (0)