close
close

OpenAI releases new O1 reasoning model

OpenAI is releasing a new model called o1, the first in a planned series of “reasoning” models trained to answer more complex questions faster than a human. It will be released alongside o1-mini, a smaller, cheaper version. And yes, if you're familiar with AI rumors, this is indeed the extremely hyped Strawberry model.

For OpenAI, o1 represents a step toward its overall goal of human-like artificial intelligence. In more practical terms, it's better at writing code and solving multi-step problems than previous models. However, it's also more expensive and slower to use than GPT-4o. OpenAI calls this version of o1 a “preview” to emphasize how new it is.

ChatGPT Plus and Team users will get access to o1-preview and o1-mini starting today, while Enterprise and Edu users will get access early next week. OpenAI plans to give all free users of ChatGPT access to o1-mini, but has not yet set a release date. Developer access to o1 is Really expensive: In the API, o1-preview costs $15 per 1 million input tokens or text blocks analyzed by the model and $60 per 1 million output tokens. For comparison, GPT-4o costs $5 per 1 million input tokens and $15 per 1 million output tokens.

The training behind o1 is fundamentally different from that of its predecessors, Jerry Tworek, head of research at OpenAI, tells me, though the company remains vague about the exact details. He says o1 was “trained using a completely new optimization algorithm and a new training dataset tailored specifically for it.”

OpenAI has taught previous GPT models to mimic patterns from its training data. With o1, it trained the model to solve problems on its own. It does this using a technique called reinforcement learning, in which the system is trained through rewards and punishments. It then uses a “chain of thought” to process queries, similar to how humans process problems by going through them step by step.

As a result of this new training method, OpenAI says, the model should be more accurate. “We found that this model hallucinates less,” says Tworek. But the problem still exists. “We can't say we've solved hallucinations.”

The main difference between this new model and GPT-4o, according to OpenAI, is that it can handle complex problems such as coding and mathematics much better than its predecessors, while also explaining its thought processes.

“The model is definitely better at solving the AP math test than I am, and I minored in math in college,” Bob McGrew, OpenAI's chief research officer, tells me. He says OpenAI also compared o1 to an International Mathematical Olympiad proficiency test, and while GPT-4o got only 13 percent of the problems right, o1 got 83 percent.

“We cannot say that we have solved hallucinations”

In online programming competitions called Codeforces competitions, this new model achieved the 89th percentile of participants, and OpenAI claims that the next update of this model will perform “similarly to PhD students on challenging benchmark tasks in physics, chemistry, and biology.”

At the same time, o1 is not as capable as GPT-4o in many areas. It is not as good at factual knowledge about the world. It also cannot surf the Internet or process files and images. Still, the company believes it represents a whole new class of capabilities. It was named o1 to indicate that “the counter is reset to 1.”

“I'll be honest: I think we've traditionally been terrible at naming things,” says McGrew. “So I hope this is the first step toward newer, more sensible names that better communicate to the rest of the world what we do.”

I wasn't able to demo o1 myself, but McGrew and Tworek showed it to me on a video call this week. They asked it to solve this puzzle:

“A princess is as old as the prince will be if the princess is twice as old as the prince when the princess' age was half the sum of their present ages. How old are the prince and the princess? Give all the answers to this question.”

The model buffered for 30 seconds and then provided a correct answer. OpenAI designed the interface to show the steps of thinking as the model thought. What struck me is not that it showed its work—GPT-4o can do that if you ask it to—but how consciously o1 seemed to mimic human thinking. Phrases like “I'm curious about,” “I'm thinking about it,” and “OK, let me see” created a step-by-step illusion of thinking.

But this model doesn't think and is certainly not human, so why design it to look like one?

Sentences like “I’m curious,” “I’m thinking about it,” and “Okay, let me think” create a gradual illusion of thinking.
Image: OpenAI

OpenAI doesn't think it's right to equate AI model thinking with human thinking, Tworek says. But the interface is meant to show how the model spends more time processing and delves deeper into problem solving, he says. “In some ways, it feels more human than previous models.”

“I think you'll find that it feels kind of alien in a lot of ways, but there are also situations where it seems surprisingly human,” says McGrew. The model is given a limited amount of time to process the queries, so it might say something like, “Oh, I'm running out of time, let me find an answer quickly.” At the beginning of its thought process, it might also appear to be brainstorming, saying something like, “I could do this or that, what should I do?”

Building on agents

Large language models are not exactly that intelligent as they exist today. They essentially just predict sequences of words to give you an answer based on patterns learned from huge amounts of data. Take ChatGPT, for example, which incorrectly claims that the word “strawberry” only has two Rs because it doesn't break the word down properly. However, the new o1 model answered this query correctly.

OpenAI is reportedly looking to raise more funding and has a staggering $150 billion valuation, but its success depends on more research breakthroughs. The company is bringing reasoning skills to LLMs because it sees a future with autonomous systems, or agents, capable of making decisions and acting on your behalf.

For AI researchers, cracking logical reasoning is an important next step toward human intelligence. The idea is that a model that can do more than just recognize patterns could enable breakthroughs in fields like medicine and engineering. Currently, however, o1's logical reasoning capabilities are relatively slow, not agent-like, and expensive for developers to use.

“We've spent many months working on the conclusion because we believe this is actually the key breakthrough,” says McGrew. “Basically, it's a new modality for models to be able to solve the really hard problems that are required to get to human-like levels of intelligence.”