OpenAI made the latest breakthrough in artificial intelligence by scaling up the size of its models to dizzying proportions when it introduced GPT-4 last year. The company announced a new breakthrough today that signals a shift in focus: a model that can logically “reason” through many difficult problems and is significantly smarter than existing AI without major scale-up.
The new model, dubbed OpenAI-o1, can solve problems that stump existing AI models, including OpenAI’s most powerful model, GPT-4o. Instead of getting an answer in one step, as a long language model typically does, it reasons through the problem, thinking out loud like a person would, before arriving at the correct result.
“This is what we consider the new paradigm in these models,” Mira Murati, OpenAI’s chief technology officer, tells WIRED. “It’s much better at tackling very complex reasoning tasks.”
The new model has been codenamed Strawberry within OpenAI and is not a successor to GPT-4o but rather a complement to it, the company says.
Murati says OpenAI is currently building its next master model, GPT-5, which will be considerably larger than its predecessor. But while the company still believes scale will help extract new capabilities from AI, GPT-5 will likely also include the reasoning technology unveiled today. “There are two paradigms,” Murati says. “The scaling paradigm and this new paradigm. We hope to bring them together.”
LLMs typically derive their answers from huge neural networks fed with vast amounts of training data. They may exhibit remarkable linguistic and logical skills, but they traditionally struggle to solve surprisingly simple problems, such as rudimentary math questions that involve reasoning.
Murati says OpenAI-o1 uses reinforcement learning, which involves giving a model positive feedback when it answers correctly and negative feedback when it doesn’t, to improve its reasoning process. “The model sharpens its thinking and fine-tunes the strategies it uses to arrive at the answer,” he says. Reinforcement learning has allowed computers to play with superhuman skill and perform useful tasks, such as designing computer chips. The technique is also a key ingredient in turning an LLM into a useful, well-behaved chatbot.
Mark Chen, OpenAI’s vice president of research, showed off the new model to WIRED, using it to solve several problems that their previous model, GPT-4o, couldn’t solve. Among them was an advanced chemistry question and the following mind-bending math puzzle: “A princess is the same age as the prince will be when the princess is twice as old as the prince was when the princess’s age was half the sum of their current ages. What is the age of the prince and the princess?” (The correct answer is that the prince is 30 and the princess is 40.)
“The (new) model is learning to think for itself, rather than trying to mimic the way humans would think,” as a conventional LLM does, Chen says.
OpenAI says its new model performs noticeably better on a range of problem sets, including those focused on coding, math, physics, biology and chemistry. On the American Invitational Mathematics Examination (AIME), a test for math majors, GPT-4o solved an average of 12 percent of problems, while o1 got 83 percent right, according to the company.