In the year and a half since ChatGPT launched, one lingering question has become more pressing: If AI can do this, why is my phone assistant still so bad?
On Monday, the chasm grew even wider, when OpenAI announced a new model called GPT-4o (the ‘o’ stands for Omni) that gives the chatbot new capabilities to understand and create audio, video and still images.
The system is amazing to behold. You can engage in long conversations about the world seen through a camera lens, perform live translations between two different languages, and even laugh at appropriate moments.
The shine will inevitably wear off as users discover the system’s shortcomings, but its creators are more confident than ever. When GPT-4 was released in 2023, OpenAI founder Sam Altman tweeted that the AI “is still flawed, still limited, and still looks more impressive on first use than after spending more time with it.”
A year later, there was no doubt about the release of its successor: in addition to a longer statement about “an exciting future in which we will be able to use computers to do much more than ever before,” Altman tweeted a single word: “her,” the name of the 2013 Spike Jonze film that shows a man slowly falling in love with his artificial intelligence assistant.
GPT-4o is closer than ever to that science fiction scenario. Previous versions of the AI have been able to speak to the user, but only through a laborious process of transcribing speech to text, running it through the normal ChatGPT system and then generating human-sounding speech in response.
On the contrary, the new system can operate directly on speech without the need to rely on other models to support it, speeding up responses and allowing it to recognize peculiarities such as tone of voice.
But it is still not an AI assistant. You can answer questions and do knowledge work, but not yet act on requests. The GPT Store, a repository of third-party integrations compiled by OpenAI, could help, but to truly integrate into the lives of regular people, GPT needs the power of Siri.
And it seems that Apple agrees. The iPhone maker has reportedly been in talks since March with AI developers, including Google and OpenAI, about licensing their technology to improve its own AI assistant. During the weekend supposedly “close” to an agreement with the latter. According to Bloomberg, which broke the news, the deal would allow Apple to offer ChatGPT along with other AI features it will announce at its annual global developer conference in June.
The connection would probably fail to completely replace Siri with ChatGPT. This is partly because Apple is wary of incorporating too much of another company’s technology into its own devices (the scars of the painful replacement of Google Maps with Apple Maps more than a decade ago remain smart), but also because even the best artificial intelligence systems are not entirely intelligent. ready for the type of demands that an assistant requires.
When it comes to an AI system that can perform tasks, generic intelligence is less important than predictability. You don’t want your AI to be able to text your friends if you can’t be sure what it will say when it sends them, a real problem faced by some of the hot AI hardware startups like Humane and Rabbit. , whose promises to replace the smartphone with AI failed.
Training an AI system to do exactly the same thing, in the same way, every time it is asked to do it is, counterintuitively, a little more difficult than creating one that gives varied but correct answers to every question. But if technology continues to improve at the rate it has, even your phone’s AI assistant might not be bad for much longer.