The voice feature powered by GPT-4o was eerily human-like, borderline flirty, and gave a glimpse into what the future of AI chatbots could look like. The feature offers users natural, real-time conversations that you can interrupt anytime. It also senses and responds to your emotions, according to OpenAI.
After several months of anticipation — and a legal battle with Scarlet Johanson — OpenAI finally started rolling out its highly anticipated Advanced Voice Mode to ChatGPT Plus and Team users on Tuesday.
It’ll also start offering the feature to Enterprise and Edu users next week, although it’s not available yet for the EU, UK, Switzerland, Iceland, Norway, and Liechtenstein, the company told Business Insider.
OpenAI will also release five new voices to Standard and Advanced Voice Mode in addition to the four already available. You’ll know you have access to the new feature once you see a pop-up message next to the entry point to Voice Mode.
I’ve been trying out the alpha version of Advanced Voice Mode over the last couple of weeks, so here’s what you can expect once you get access.
It’s highly entertaining
It doesn’t get old — or at least it hasn’t yet for me.
The ability to interrupt the chatbot mid-response was an oddly satisfying experience that made me feel in control. It also took out the frustration of having to go back and forth with a virtual assistant that didn’t hear me right the first time.
I also got a kick out of pulling up Advanced Voice Mode in front of others and watching their jaws drop when it responds. It’s slightly eerie how human it sounds, but also incredibly impressive.
The intonation is close to flawless, and everything from its thoughtful pauses to laughing at its own jokes creates a surreal experience. To test its understanding of more complex topics, I gave it sample SAT questions and asked it to choose the right answer. It took on the role of a tutor and guided me through the solutions, step by step — and getting the answers correct.
Although I mainly stuck with the Breeze voice, the diverse options of voices provided a glimpse into how people can personalize their experience. For the first time, I understood how people can use AI as a companion.
The response accuracy isn’t quite there yet
While many of the individual responses I received from Advanced Voice Mode were accurate and helpful, my overall conversations were less successful.
The chatbot would sometimes stop listening to my voice or start late. An OpenAI spokesperson said the company used learnings from the alpha version to improve conversation speed and smoothness. So some of those lapses may be fixed.
Advanced Voice Mode also didn’t give the thoroughness or detail that the text mode offers. There were also some instances where the conversation went in circles.
For example, I asked Advanced Voice Mode to recommend the best credit card reward program for me, but it got sidetracked into a conversation about my hobbies. It eventually suggested American Express, but I had to redirect the conversation multiple times, and it didn’t offer much detail about the card.
ChatGPT’s Advanced Voice Mode feels a world apart from Siri or other voice assistants, which don’t offer smooth back-and-forth conversations or pick up on emotion. But it’s not quite up to speed with the text version yet.