Generative AI is an umbrella term for any type of automated process that uses algorithms to produce, manipulate, or synthesize data, often in the form of images or human-readable text. It is called generative because the AI creates something that didn’t exist before. That’s what makes it different discriminating AI, which distinguishes between different types of input. In other words, discriminatory AI tries to answer a question like “Is this image a drawing of a rabbit or a lion?” while generative AI responds to questions like “Draw me a picture of a lion and a rabbit sitting side by side.”
This article introduces you to generative AI and its use with popular models such as ChatGPT and DALL-E. We’ll also consider the limitations of the technology, including why “too many fingers” have become a dead giveaway for artificially generated art.
The rise of generative AI
Generative AI has been around for years, probably ever since ELIZA, a chatbot that simulates talking to a therapist, was developed at MIT in 1966. But years of work on AI and machine learning have recently come to fruition with the introduction of new generative AI systems. You’ve almost certainly heard of it ChatGPTa text-based AI chatbot that produces remarkable human prose. DALL-E And Stable spread have also drawn attention to their ability to create vivid and realistic graphics from text prompts. We often refer to these systems and similar systems as models because they represent an attempt to simulate or model some aspect of the real world based on a (sometimes very large) subset of information about it.
The output of these systems is so uncanny that many people are asking philosophical questions about the nature of consciousness – and are concerned about the economic impact of generative AI on human jobs. But while all of these artificial intelligence creations are undeniably big news, there’s arguably less going on beneath the surface than some might think. We’ll get to some of those big questions in a moment. First, let’s take a look at what’s going on under the hood of models like ChatGPT and DALL-E.
How does generative AI work?
Generative AI uses machine learning to process a huge amount of visual or textual data, much of it pulled from the internet, and then determine which things are most likely to appear near other things. Much of Generative AI’s programming goes into creating algorithms that can discern the “things” of interest to the creators of the AI - words and sentences in the case of chatbots like ChatGPT, or visual elements for DALL-E. But essentially generative AI creates its output by assessing a huge corpus of data it has been trained on, then responding to prompts with something that falls within the range of probability as determined by that corpus.
Auto-complete — when your cell phone or Gmail suggests what the rest of the word or phrase you’re typing might be — is a low-level form of generative AI. Models like ChatGPT and DALL-E just take the idea to much more advanced heights.
Generative AI model training
The process of developing models to accommodate all this data is called course. A number of underlying techniques play a role here for different types of models. ChatGPT uses a so-called a transformer (that’s what the T means). A transformer extracts meaning from long strings of text to understand how different words or semantic components might be related to each other, then determine how likely they are to occur near each other. These transformers are performed unattended on a huge corpus of natural language text in a process called pre-training (that’s the Pin ChatGPT), before being refined by human beings interacting with the model.
Another technique used to train models is the so-called a generative hostile network, or GAN. In this technique, you have two algorithms competing against each other. One is generating text or graphics based on probabilities derived from a large data set; the other is a discriminating AI, trained by humans to judge whether that output is real or AI-generated. The generative AI repeatedly tries to “trick” the discriminating AI and automatically adapts to promote successful outcomes. Once the generative AI consistently “wins” this competition, the discriminative AI is refined by humans and the process begins again.
One of the most important things to keep in mind is that while there is human intervention in the training process, most of the learning and adaptation happens automatically. It takes so many iterations to get the models to produce interesting results that automation is essential. The process is quite computationally intensive.
Is generative AI conscious?
The math and coding required to create and train generative AI models is quite complex and well beyond the scope of this article. But when you interact with the models that are the end result of this process, the experience can be decidedly eerie. You can have DALL-E produce things that look like real works of art. You can have conversations with ChatGPT that feel like a conversation with another human being. Have researchers really created a thinking machine?
Chris Phipps, a former IBM natural language processing leader who worked on Watson AI products, says no. He describes ChatGPT as a “very good prediction engine”.
It is very good at predicting what people will find coherent. It’s not always coherent (usually it is), but that’s not because ChatGPT “gets” it. It’s the opposite: people who consume the output are very good at making whatever implicit assumptions we need to make the output make sense.
Phipps, who is also a comedy artist, makes a comparison to a general improv game called Mind Meld.
Two people each come up with a word and then say it out loud at the same time – you could say “boot” and I say “tree”. We came up with those words completely independently and at first they had nothing to do with each other. The next two contestants take those two words and try to think of something they have in common and say it out loud at the same time. The game continues until two participants say the same word.
Maybe two people both say “lumberjack.” It seems like magic, but in reality we use our human brains to reason about the input (“boat” and “tree”) and find a connection. We do the work of understanding, not the machine. There’s a lot more going on with ChatGPT and DALL-E than people admit. ChatGPT can write a story, but we humans do a lot of work to make it make sense.
Test the limits of computer intelligence
Certain clues we can give to these AI models will make Phipps’ point pretty clear. For example, consider the riddle “Which weighs more, a pound of lead or a pound of feathers?” The answer, of course, is that they weigh the same (one pound), even though our instinct or common sense might tell us that the feathers are lighter.
ChatGPT will answer this conundrum correctly, and you might assume it does because it’s a cold logical computer that has no “common sense” to trip it up. But that’s not what goes on under the hood. ChatGPT does not reason the answer logically; it simply generates output based on its predictions of what should follow a question about a pound of feathers and a pound of lead. Since the training set contains a lot of text explaining the riddle, it compiles a version of that correct answer. But if you ask ChatGPT or two kilos of feathers are heavier than a kilo of lead, he will confidently tell you that they weigh the same, because that is still the most likely output for a prompt about feathers and lead, based on his training set. It can be fun to tell the AI it’s wrong and watch it bounce back in response; I made it apologize for its mistake and then suggested that weigh two pounds of feathers four times as much as a pound of lead.