Why Tom Cruise's AI Problem Means It's 'doomed To Fail'

YoIn 2021, linguist Emily Bender and computer scientist Timnit Gebru published an article which described the then-nascent field of linguistic models as one of “stochastic parrots.” A linguistic model, they wrote, “is a system for randomly putting together sequences of linguistic forms that it has observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning.”

The phrase has stuck. AI can still improve, even if it is a stochastic parrot, because the more training data it has, the better it seems. But does something like ChatGPT really display anything resembling intelligence, reasoning or thinking? Or is it simply, on ever larger scales, “randomly stitching together sequences of linguistic forms”?

In the world of AI, criticism is often dismissed with a wave of the hand. When I spoke to Sam Altman last year, he seemed almost surprised to hear such an outdated criticism. “Is this still a mainstream view? I mean, is it taken seriously? Are there still a lot of serious people who think this way?” he asked.

Sam Altman, CEO of OpenAI. Photograph: Jason Redmond/AFP/Getty Images

“My perception is that after GPT-4, people stopped saying that and started saying, ‘OK, it works, but it’s too dangerous. ’” GPT-4, he said, was reasoning, “to a small extent.”

Sometimes the debate seems semantic. What does it matter whether the AI system reasons or just repeats the same thing if it can solve problems that were previously beyond the reach of computation? Sure, if you’re trying to create an autonomous moral agent — a general intelligence capable of succeeding humanity as the protagonist of the universe — you might want it to be able to think. But if you’re just creating a useful tool — even if it’s useful enough to be a new general-purpose technology — does the distinction matter?

Chips, not facts

It turns out yes. As Lukas Berglund, et al wrote last year:

If a human learns the fact that “Valentina Tereshkova was the first woman to travel to space,” he or she may also correctly answer, “Who was the first woman to travel to space?” This is such a basic form of generalization that it seems trivial. However, we show that autoregressive language models fail to generalize in this way.
This is an example of an ordering effect we call the Curse of Inversion.

Researchers “taught” a lot of fake data to large language models and found time and again that they simply couldn’t do the basic job of inferring otherwise. But the problem doesn’t just exist in toy models or artificial situations:

We tested GPT-4 on pairs of questions like “Who is Tom Cruise’s mother?” and “Who is Mary Lee Pfeiffer’s son?” for 1000 different celebrities and their real parents. We found many cases where a model answers the first question (“Who is Mary Lee Pfeiffer’s father is Tom Cruise?), but not the second. We hypothesize that this is because the pre-training data includes fewer instances of ordinations where the father precedes the celebrity (e.g., “Mary Lee Pfeiffer’s son is Tom Cruise”).

One way to explain this is to realize that LLMs do not learn about relationships between facts, but between… recordsthe linguistic forms that Bender described. The tokens “Tom Cruise’s mother” are linked to the tokens “Mary Lee Pfeiffer,” but the reverse is not necessarily true. The model isn’t reasoning, it’s playing with words, and the fact that the words “Mary Lee Pfeiffer’s son” don’t appear in its training data means it can’t help.

But another way to explain it is to realize that, well, humans are asymmetrical in this sense too. reasoning is symmetrical: if we know that two people are mother and son, we can talk about that relationship in both directions. But our remember No: It’s much easier to remember fun facts about celebrities than to be given prompts, without context, with barely recognizable bits of information and asked to explain exactly why you know them.

At its most extreme, this is obvious: compare being asked to list the 50 states of the United States with being shown a list of 50 states and asked to name the country that they are made of. As a matter of reasoning, the facts are symmetrical; as a memorization task, they are not at all.

But doctor, this man is my son.

Skip newsletter promotion

A cabbage. A man, a goat, and his boat do not appear in the image. Photograph: Chokchai Silarug/Getty Images

This is by no means the only type of problem that LLMs fail to reason about. Gary Marcus, a long-time AI researcher and LLM skeptic, gave his own example this weekOne class of problems that even state-of-the-art systems fail at are questions that look like common puzzles, but aren’t. Try these on any of your favorite chatbots, if you want to see what I mean:

A man and his son are involved in a car accident. The man, who is homosexual, dies, but the son survives. However, when he is wheeled into the operating room, the surgeon says: “I cannot operate on this man, he is my son!” Who is the surgeon?
A man, a cabbage and a goat are trying to cross a river. They have a boat that can only carry three things at a time. How do they do it?
Suppose you are in a game show and you are given a choice between three doors: behind one there is a car; behind the others there are goats. You choose a door, say number 1, and the host, who knows what is behind the doors, opens another door, say number 3, in which there is a goat. Then he says to you: “Do you want to choose door number 2, in which there is definitely a goat?” Would you rather change your choice?

The answers to all three are simple (the child’s other parent; put everything in the boat and cross the river; no, obviously not, unless you want a goat), but they seem like more complicated or tricky questions, and LLMs will stumble along the path they expect the answer to take.

Frame:

The fact is that current approaches to machine learning (which underlie most of the AI we talk about today) are terrible when it comes to outliers – that is, when they encounter unusual circumstances, like the subtly altered speech problems I mentioned a few days ago, they often say and do absurd things (I call that misunderstanding).
The middle divide in AI knowledge is this: either you understand that today’s neural networks have great difficulty dealing with outliers (just like their 1990s predecessors did), and therefore understand why today’s AI is doomed to fail on many of its most luxurious promises, or you don’t.
Once you do, almost everything people like Altman, Musk, and Kurzweil are currently saying about the impending IAG seems like pure fantasy, as grand as imagining that very tall stairs will soon be built to reach the Moon.

I’m afraid to take a “god of the gaps” approach to AI: arguing that things that frontier systems can’t do today Things they will never be able to do are a fast track to looking foolish in the future. But when the model put forward by AI critics does a good job of predicting exactly the kinds of problems the technology will have to contend with, it should add to the notes of concern that have resonated in markets this week: what if the bubble is about to burst?

If you would like to read the full version of the newsletter, please subscribe to receive TechScape in your inbox every Tuesday.

Why Tom Cruise’s AI problem means it’s ‘doomed to fail’

Balloonatics: Tourist inhales gas from a balloon after crashing his car in a Spanish roundabout

I spend $17,000 a year traveling alone and I leave my boyfriend at home. I won’t let a relationship stop me from seeing the world.

You may also like