OpenAI wants AI to help humans train AI

written by Alexander June 27, 2024 0 comments

One of the key ingredients that made ChatGPT a smashing success was an army of human trainers who gave the AI model behind the bot guidance on what constitutes good and bad results. Open AI now says That adding even more AI to the mix, to assist human trainers, could help make AI helpers smarter and more reliable.

When developing ChatGPT, OpenAI pioneered the use of reinforcement learning with human feedback, or RLHF. This technique uses information from human evaluators to fine-tune an AI model so that its output is considered more consistent, less objectionable, and more accurate. The ratings given by the trainers feed an algorithm that drives the model’s behavior. The technique has proven crucial both in making chatbots more reliable and useful and in preventing them from misbehaving.

“RLHF works very well, but it has some key limitations,” says Nat McAleese, an OpenAI researcher involved in the new work. For one thing, human feedback can be inconsistent. On the other hand, it can be difficult even for trained humans to grade extremely complex results, such as sophisticated software code. The process can also optimize a model to produce results that look convincing rather than actually accurate.

OpenAI developed a new model by fine-tuning its most powerful offering, GPT-4, to assist human trainers tasked with evaluating code. The company found that the new model, dubbed CriticGPT, could detect bugs that humans missed, and that human judges found its critiques of code to be better 63 percent of the time. OpenAI will look to expand the approach to areas beyond code in the future.

“We are starting to work on integrating this technique into our RLHF chat stack,” says McAleese. He notes that the approach is imperfect, as CriticGPT can also make errors when hallucinating, but adds that the technique could help make OpenAI models, as well as tools like ChatGPT, more accurate by reducing errors in human training. He adds that it could also prove crucial in helping make AI models much smarter, because it could allow humans to help train an AI that surpasses its own capabilities. “And as the models continue to get better and better, we suspect people will need more help,” McAleese says.

The new technique is one of many being developed now to improve large language models and extract more abilities from them. It’s also part of an effort to ensure that AI behaves acceptably even as it becomes more capable.

Earlier this month, Anthropic, an OpenAI rival founded by former OpenAI employees, announced a more capable version of its own chatbot, called Claude, thanks to improvements to the model’s training regimen and the data it receives. Anthropic and OpenAI have also recently touted new ways to inspect AI models to understand how they arrive at their output in order to better prevent unwanted behavior like deception.

The new technique could help OpenAI train increasingly powerful AI models while ensuring that its results are more reliable and in line with human values, especially if the company successfully deploys it in more areas than the code. OpenAI has said it is training its next big AI model, and the company is evidently eager to show it’s serious about ensuring it behaves well. This follows the dissolution of a prominent team dedicated to assessing the long-term risks posed by AI. The team was co-led by Ilya Sutskever, a company co-founder and former board member who briefly ousted CEO Sam Altman from the company before backing down and helping him regain control. Several members of that team have since criticized the company for acting riskily as it rushes to develop and commercialize powerful AI algorithms.

Dylan Hadfield-Menell, an MIT professor who researches ways to align AI, says the idea of AI models helping train more powerful ones has been around for a while. “This is a pretty natural development,” he says.

Hadfield-Menell notes that the researchers who originally developed techniques used for RLHF discussed ideas related several years ago. He says it remains to be seen how applicable and powerful it is generally. “It could lead to great advances in individual capabilities and could be a stepping stone toward more effective feedback in the long term,” she says.

OpenAI wants AI to help humans train AI

Heart-stopping moment as utility worker desperately tries to save colleague as he clings to burning crane

Diana Taurasi issues blunt four-word response to Caitlin Clark question ahead of their first WNBA showdown… after veteran’s previous remarks

You may also like